This course introduces statistical learning, which involves providing a theoretical underpinning and the foundation of machine learning (and artificial intelligence in general). This is a second course in machine learning, and we assume that you have already taken an introductory machine learning class (such as CS 6140 or DS 5220, DS 4400). The course will involve a mix of materials from different subjects such as learning theory, statistics, neural networks and deep learning, information theory, and reinforcement learning.
We will mostly draw on mathematical analysis to rigorously analyze the bahavior of machine learning models and algorithms (though we will emphasize their practical implications throughout the course).
Prerequisites
Students are expected to be familiar with basic calculus and linear algebra and comfortable reading and writing proofs.
Prior knowledge in probability and linear algebra.
Having taken an introductory machine learning class.
Week 1, Jan 6: Overview; Jan 8: Uniform convergence
What is the course about
Basic setup of supervised learning, empirical risk minimization, and uniform convergence
Basic setup of neural networks
Statistical transfer learning
Learning finite, realizable hypothesis spaces
Week 2, Jan 13: Concentration estimates, Jan 15: Rademacher complexity
Markov's inequality, Chebyshev's inequality, and Chernoff bound
Moment generating function
Sub-Gaussian random variables
Rademacher complexity (definition and properties)
Week 3, Jan 22: Examples of Rademacher complexity
Learning finite hypothesis classes
L2/L1 norm constrained hypothesis classes
Week 4, Jan 27: Matrix completion, Jan 29: Two-layer neural networks
Wrapping up the proof of Rademacher complexity-based generalization bound
Matrix completion
Path norm bounds for two-layer neural networks
Week 5, Feb 3: VC dimension and over-parameterized neural networks, Feb 5: Neural tangent kernel
Shattering and VC dimension
Sample complexity of multi-layer ReLU networks
Neural tangent kernel
Week 6, Feb 10: Implicit regularization in matrix sensing, Feb 12: From implicit regularization to benign overfitting
Over-parameterized matrix sensing
Gradient descent starting from a small initialization
Spectral decomposition
Week 7, Feb 19: Wrapping up implicit regularization sensing
Error dynamic in rank-one matrix sensing
Benign overfitting in linear regression with minimum norm interpolation
There will be three homeworks, for a total of 40% of overall grade. The homeworks should be done individually and submitted separately as well.
The course project includes an in-class presentation for 40% of total grade and a final course project for 20% of total grade.
There isn’t a single textbook that covers all of the lectures, though the following are good references for the course materials.
Statistical learning theory lecture notes, Percy Liang (Stanford)
Mathematical analysis of machine learning algorithms, Tong Zhang (UIUC)
Learning theory from first principles, Francis Bash (INRIA)