CS7140: Advanced Machine Learning (Spring 2025)

Course information

This course introduces statistical learning, which involves providing a theoretical underpinning and the foundation of machine learning (and artificial intelligence in general). This is a second course in machine learning, and we assume that you have already taken an introductory machine learning class (such as CS 6140 or DS 5220, DS 4400). The course will involve a mix of materials from different subjects such as learning theory, statistics, neural networks and deep learning, information theory, and reinforcement learning.

We will mostly draw on mathematical analysis to rigorously analyze the bahavior of machine learning models and algorithms (though we will emphasize their practical implications throughout the course).

Prerequisites

Students are expected to be familiar with basic calculus and linear algebra and comfortable reading and writing proofs.
Prior knowledge in probability and linear algebra.
Having taken an introductory machine learning class.

Course syllabus

Week 1, Jan 6: Overview; Jan 8: Uniform convergence

What is the course about
Basic setup of supervised learning, empirical risk minimization, and uniform convergence
Basic setup of neural networks
Statistical transfer learning
Learning finite, realizable hypothesis spaces

Week 2, Jan 13: Concentration estimates, Jan 15: Rademacher complexity

Markov's inequality, Chebyshev's inequality, and Chernoff bound
Moment generating function
Sub-Gaussian random variables
Rademacher complexity (definition and properties)

Week 3, Jan 22: Examples of Rademacher complexity

Learning finite hypothesis classes
L2/L1 norm constrained hypothesis classes

Week 4, Jan 27: Matrix completion, Jan 29: Two-layer neural networks

Wrapping up the proof of Rademacher complexity-based generalization bound
Matrix completion
Path norm bounds for two-layer neural networks

Week 5, Feb 3: VC dimension and over-parameterized neural networks, Feb 5: Neural tangent kernel

Shattering and VC dimension
Sample complexity of multi-layer ReLU networks
Neural tangent kernel

Week 6, Feb 10: Implicit regularization in matrix sensing, Feb 12: From implicit regularization to benign overfitting

Over-parameterized matrix sensing
Gradient descent starting from a small initialization
Spectral decomposition

Week 7, Feb 19: Wrapping up implicit regularization in matrix sensing

Error dynamic in rank-one matrix sensing
Benign overfitting in linear regression with minimum norm interpolation

Week 8, Feb 24: PAC-Bayes generalization bounds, Feb 26: Noise sensitivity generalization bounds

Occam's bound
McAllester's bound
Noise sensitivity bounds

Coursework and grading

There will be three homeworks, for a total of 40% of overall grade. The homeworks should be done individually and submitted separately as well.

The course project includes an in-class presentation for 40% of total grade and a final course project for 20% of total grade.

Textbooks

There isn’t a single textbook that covers all of the lectures, though the following are good references for the course materials.

Statistical learning theory lecture notes, Percy Liang (Stanford)

Mathematical analysis of machine learning algorithms, Tong Zhang (UIUC)

Learning theory from first principles, Francis Bash (INRIA)