Lecture notes
A guide to the course notes. On this page I will post the course notes that will be used during each lecture. These will be slides, and will be posted before the lecture. After each lecture, the annotated slides will be posted too. One "Lecture" file may be used for more than one actual lecture. To eliminate confusion, the unannotated notes posted here will be titled Lecture I, Lecture II, etc, while the annotated notes after each lecture, will be titled L1a-jan6, L1p-jan6, L2a-jan8, etc.
Lecture templates
- Lecture 0 Overview of Machine Learnng
- Lecture I Prediction: examples. The Nearest Neighbor (NN) predictor.
Lecture I-1 Bias and Variance for the Nearest Neighbor (NN) predictor
- Lecture II Linear regression and classification. Loss Functions. Maximum likelihood. UPDATED 2/2
- Lecture III CART
- Lecture IV Neural Networks -- Part 1. Neural network predictors, basic concepts.
- Lecture IV Neural Networks -- Part 2. Backpropagation algorithm and practice (new section added 2/24)
- Lecture V Convolutional and residual networks, Transformers
- Lecture VI Autoencoders and Generative Models (edits still possible)
- Guest Lecture VI-2 Transformers (courtesy of P. Poupart)
- Lecture VII Clustering: K-means and Mixtures of Gaussians
- Lecture VIII Principal Component Analysis
Annotated lecture slides
- L1a-jan6, L1p-jan6 What is ML?
- L2a-jan8, L2p-jan8 Predictors by type of output. Nearest neighbor predictor.
- L3a-jan13, L3p-jan13 K-NN bias-variance tradeoff
- L4a-jan15, L4p-jan15 Losses. Linear regression by LS. (1/20/2026)
- L5a-jan22, L5p-jan22 Linear regression by ML (Updated 2/4)
- L6a-jan27, L6p-jan27 Perceptron, LDA, Logistic regression
- L7a-jan29, L7p-jan29 Gradient ascent/descent, CART
- L8a-feb3, L8p-feb3 CART, Neural networks 1 unit
- L9a-feb5, L9p-feb5 2 layer, multi-layer Neural networks
- L10a-feb10, L10p-feb10 2 layer, multi-layer Neural networks
- L11a-feb12, L11p-feb12 2 layer, multi-layer Neural networks
- L12a-feb24, L12p-feb24 SGD and Heavy Ball/momentum
- L13a-feb26, L13p-feb26 Overfitting. weight decay, dropout and normalization
- L14a-mar3, L14p-mar3 Adaptive learning rate, Convnets, Resnets
- L15a-mar5, L15p-mar5 Autoencoders/VAE
- L16a-mar10, L16p-mar10 Transformers and attention
- L17a-mar12, L17p-mar12 VAE (continued), Generative models
- L18a-mar17, L18p-mar17 Clustering. K-means
- L19a-mar19, L19p-mar19 EM algorithm
- L20a-mar24, L20p-mar24 Selecting K. PCA.
- L21a-mar26, L21p-mar26 PCA. Cross-Validation
(--a = 11:30-12:60 section, --p =4-6:20 section)
Refresher materials
- Calculus and linear algebra by Haochen Sun here
- Probability and Statistics by Gavin Deane here. Sample tutorial problem solutions: Q1, Q2(a), Q2(b), Q3.
- Plotting data with matplotlib by Henry Lin here. Data here
|