Jacob Schnell

Jacob Schnell

Machine Learning Researcher

University of Waterloo

Biography

I am a fourth year student in Computer Science and Statistics at the University of Waterloo. My research interests include computer vision, diffusion models, and self-supervised learning. I am currently leading Veer Renewable’s machine learning team, developing diffusion models to make renewable energy more efficient by super-resolving wind patterns to determine the location that maximizes the yield of a wind turbine. I am also an accelerated master’s student in the VIP lab at Waterloo under the supervision of Dr. Yuhao Chen and Dr. Jesse Hoey.

Interests
  • Computer Vision
  • Diffusion Models
  • Large Language Models
  • Self-supervised Learning
Education
  • M.Math in Computer Science, 2024--2026

    University of Waterloo

  • B.Math in Computer Science and Statistics, 2020--2025

    University of Waterloo

Experience

 
 
 
 
 
MAGI Inc.
Machine Learning Engineering Intern
January 2025 – Present New York, NY
  • Joining MAGI’s machine learning team in the winter.
 
 
 
 
 
Veer Renewables
Lead Research Scientist
May 2024 – Present Vancouver, BC (Remote)
  • Leading Veer Renwable’s research and development team to develop a diffusion model performing 16x super-resolution on wind data. The model produces high-quality reconstructions of wind patterns 500x cheaper than traditional simulation methods. We are targeting publication.
  • Coordinated hiring a team of undergraduate researchers. I now manage a team of four researchers, helping to mentor them and focus their efforts on improving Veer’s diffusion models.
  • Currently focusing on improving the fidelity of reconstructions by leveraging conditional information, such as the terrain elevation, and exploring more sophisticated neural architectures and diffusion process samplers.
  • Engineered distributed training and inference pipelines on AWS using PyTorch Distributed, AWS EC2, and Slurm.
 
 
 
 
 
GPTZero
Machine Learning Research Intern
May 2024 – August 2024 Toronto, ON
  • Trained and analyzed RoBERTa classifiers, and metric-based classifier to detect AI-written articles with an emphasis on identifying failure cases and robustness to out-of-distribution data. Aided in writing a paper based on these results, currently under review.
  • Implemented feature-fusion from Large Language Model (LLM) embeddings and ensembling using metric-based classifiers to decrease the false positive rate in out-of-distribution samples by 21%. Developed four models to fuse LLM embeddings with a RoBERTa classifier.
  • Optimized the production pipeline to achieve 13% faster model inference, and developed a parallelized SQL-based dataset resulting in 30% faster model training.
 
 
 
 
 
Waabi Innovations Inc.
Machine Learning Research Intern
September 2023 – April 2024 Toronto, ON
  • Led a research project to develop a simpler map representation and embedding model for use in autonomous vehicle perception, prediction, and planning under the guidance of Dr. Raquel Urtasun. We are aiming for a publication.
  • The novel representation and transformer-based architecture uses 14% fewer parameters, is 20% faster, and achieves better downstream performance relative to the state-of-the-art LaneGCN architecture.
  • Innovated several self-supervised pre-training tasks on vector maps, including masked-autoencoding, contrastive learning, and query classification, improving performance and convergence speed in downstream training.
 
 
 
 
 
University of California Merced
Undergraduate Research Assistant
May 2023 – November 2023 Merced, CA (Remote)
  • First author of ScribbleGen, a novel method to generate synthetic training data for use in weakly-supervised semantic segmentation (WSSS). Using a ControlNet diffusion model we generate synthetic images from segmentation scribbles. The resulting image-scribble pairs are used to train a WSSS model. Our research is available as an arXiv preprint.
  • Ensured consistency between labels and synthetic images using classifier-free guided diffusion and encode ratios.
  • Training using ScribbleGen improves mIoU by 2.3% when training using the full dataset and 6.1% in a low-data setting, achieving a new state-of-the-art for the PascalScribble WSSS dataset.
 
 
 
 
 
Waabi Innovations Inc.
Machine Learning Research Intern
January 2023 – May 2023 Toronto, ON
  • Developed a novel training scheme for vehicle motion forecasting models under the guidance of Dr. Raquel Urtasun.
  • In the new training scheme, motion forecasting is supervised through the trajectory planner to learn predictions that encourage safer driving plans. Training using the new scheme reduced the collision rate by 65%.
  • Further experimented with identifying important actors using self-attention weights, to reweight loss by the vehicle’s importance, focusing learning efforts on safety-critical vehicles.
 
 
 
 
 
Flex A.I. Inc.
Machine Learning Engineering Intern
September 2021 – August 2022 Vancouver, BC (Remote)
  • Constructed a novel 4-stage machine learning architecture leveraging pre-trained YOLOv3 and AlphaPose models to identify user errors in workouts from videos. Used this to develop a new data pipeline to automatically process new examples and add them to our dataset.
  • Optimized the neural architecture and hyperparameters improving accuracy by 13.4% and training 12x faster compared to our baseline video model (a 3D CNN).
  • Developed a multitask image classification model identifying 17 errors across 5 different workout exercises, improving latency and accuracy compared to using individual models.

Recent Publications

ScribbleGen: Generative Data Augmentation Improves Scribble-supervised Semantic Segmentation