Jacob Schnell

Machine Learning Researcher

University of Waterloo

Biography

I am a fourth year student in Computer Science and Statistics at the University of Waterloo. My research interests include computer vision, diffusion models, and self-supervised learning. I am currently leading Veer Renewable’s machine learning team, developing diffusion models to make renewable energy more efficient by super-resolving wind patterns to determine the location that maximizes the yield of a wind turbine. I am also an accelerated master’s student in the VIP lab at Waterloo under the supervision of Dr. Yuhao Chen and Dr. Jesse Hoey.

Interests

Computer Vision
Diffusion Models
Large Language Models
Self-supervised Learning

Education

M.Math in Computer Science, 2024--2026

University of Waterloo
B.Math in Computer Science and Statistics, 2020--2025

University of Waterloo

Experience

Lead Research Scientist

Veer Renewables

May 2024 – Present Vancouver, BC (Remote)

Leading Veer Renwable’s research and development team to develop a diffusion model performing 16x super-resolution on wind data. The model produces high-quality reconstructions of wind patterns 500x cheaper than traditional simulation methods. We are targeting publication.
Coordinated hiring a team of undergraduate researchers. I now manage a team of four researchers, helping to mentor them and focus their efforts on improving Veer’s diffusion models.
Currently focusing on improving the fidelity of reconstructions by leveraging conditional information, such as the terrain elevation, and exploring more sophisticated neural architectures and diffusion process samplers.
Engineered distributed training and inference pipelines on AWS using PyTorch Distributed, AWS EC2, and Slurm.

Machine Learning Engineering Intern

MAGI Inc.

January 2025 – Present New York, NY

Helping MAGI develop Closer, an AI-powered relationship coaching platform. Leveraging Large Language Models (LLMs) in LangChain and Retrieval-Augmented Generation (RAG) to provide user’s with actionable feedback.

Machine Learning Research Intern

GPTZero

May 2024 – August 2024 Toronto, ON

Trained and analyzed RoBERTa classifiers, and metric-based classifier to detect AI-written articles with an emphasis on identifying failure cases and robustness to out-of-distribution data. Aided in writing a paper based on these results, currently under review.
Implemented feature-fusion from Large Language Model (LLM) embeddings and ensembling using metric-based classifiers to decrease the false positive rate in out-of-distribution samples by 21%. Developed four models to fuse LLM embeddings with a RoBERTa classifier.
Optimized the production pipeline to achieve 13% faster model inference, and developed a parallelized SQL-based dataset resulting in 30% faster model training.

Machine Learning Research Intern

Waabi Innovations Inc.

September 2023 – April 2024 Toronto, ON

Led a research project to develop a simpler map representation and embedding model for use in autonomous vehicle perception, prediction, and planning under the guidance of Dr. Raquel Urtasun. We are aiming for a publication.
The novel representation and transformer-based architecture uses 14% fewer parameters, is 20% faster, and achieves better downstream performance relative to the state-of-the-art LaneGCN architecture.
Innovated several self-supervised pre-training tasks on vector maps, including masked-autoencoding, contrastive learning, and query classification, improving performance and convergence speed in downstream training.

Undergraduate Research Assistant

University of California Merced

May 2023 – November 2023 Merced, CA (Remote)

First author of ScribbleGen, a novel method to generate synthetic training data for use in weakly-supervised semantic segmentation (WSSS). Using a ControlNet diffusion model we generate synthetic images from segmentation scribbles. The resulting image-scribble pairs are used to train a WSSS model. Our research is available as an arXiv preprint.
Ensured consistency between labels and synthetic images using classifier-free guided diffusion and encode ratios.
Training using ScribbleGen improves mIoU by 2.3% when training using the full dataset and 6.1% in a low-data setting, achieving a new state-of-the-art for the PascalScribble WSSS dataset.

Machine Learning Research Intern

Waabi Innovations Inc.

January 2023 – May 2023 Toronto, ON

Developed a novel training scheme for vehicle motion forecasting models under the guidance of Dr. Raquel Urtasun.
In the new training scheme, motion forecasting is supervised through the trajectory planner to learn predictions that encourage safer driving plans. Training using the new scheme reduced the collision rate by 65%.
Further experimented with identifying important actors using self-attention weights, to reweight loss by the vehicle’s importance, focusing learning efforts on safety-critical vehicles.

Machine Learning Engineering Intern

Flex A.I. Inc.

September 2021 – August 2022 Vancouver, BC (Remote)

Constructed a novel 4-stage machine learning architecture leveraging pre-trained YOLOv3 and AlphaPose models to identify user errors in workouts from videos. Used this to develop a new data pipeline to automatically process new examples and add them to our dataset.
Optimized the neural architecture and hyperparameters improving accuracy by 13.4% and training 12x faster compared to our baseline video model (a 3D CNN).
Developed a multitask image classification model identifying 17 errors across 5 different workout exercises, improving latency and accuracy compared to using individual models.

Recent Publications

ScribbleGen: Generative Data Augmentation Improves Scribble-supervised Semantic Segmentation

CVPR 2024 Workshop SyntaGen,

Jacob Schnell, Jieke Wang, Lu Qi, Vincent Tao Hu, Meng Tang