Time & Place. WF 11:00-12:15, Siebel 1214 (used to be 4405).
Instructor. Matus Telgarsky (Office hours: Siebel 3212 (used to be 3336), M 5:00-7:00).
Homework. 50% of your grade. Can work alone or in a group of size 2. Homework must be \(\LaTeX\)-compiled, and submitted through gradescope (self-enrollment code 9J8G59). Full details appear below.
Project. 50% of your grade. Must contain some theoretical component. Full details appear below.
Academic integrity. Cheating in this class wastes everyone’s time, just take something else. Please see the full information below.
Discussion. piazza, here’s the signup link.
Feedback. This course is experimental; not just new material and a new instructor, but new notes written in a new way (markdown+latex). Please provide feedback!
(9/10) I’m ceasing use of this “announcements” part of the website; please use piazza.
(8/28) Office hours are only 5-6pm on Monday, August 29. Also: I’ve gotten in the habit of using piazza for announcements since it generates notifications. Please be sure to set things up there.
(8/27) Class on September 28 is tentatively canceled; please consider attending the Allerton conference.
(8/27) Please be sure to sign up for piazza, where I’ll start threads for discussing lectures and notes.
(8/16) Hello, Friend.
Notes posted following lectures; they will lack some intuition, pictures, and discussion found in class, but may have some more rigor. The schedule for future lectures is tentative (and perhaps too quick).
Date. | Topics. | Notes. | Coursework. | |
8/24 | Syllabus, philosophy, and a quick proof. | html, pdf. | hw\(0\) out: html, pdf, md. | |
Representation. | ||||
8/26 | Linear, linear with rich bases, lemma for next lecture. | html, pdf. | ||
8/31 | Trees, boosted trees, branching programs, neural nets intro. | html, pdf. | hw\(0\) due. | |
9/2 | 3 layer networks, 2 layer networks. | html, pdf. | ||
9/7 | Benefits of depth, part 1. | html, pdf. | ||
9/9 | Benefits of depth, part 2. | html, pdf. | hw\(1\) out: tex, pdf, bib. | |
Optimization. | ||||
9/14 | Convexity bootcamp part 1: basic objects. | html, pdf. | pm\(0\) due. | |
9/16 | Convexity bootcamp part 2: duality. | html, pdf. | ||
9/21 | SVM basics, representer theorem. | html, pdf. | ||
9/23 | SVM recap, convex opt overview, Frank-Wolfe. | html, pdf. | ||
9/28 | No class! Go to Allerton! | |||
9/30 | Smoothness and steepest descent. | html, pdf. | ||
10/5 | Smoothness and steepest descent, part 2; AdaBoost intro. | html, pdf. | hw\(1\) due. | |
10/7 | AdaBoost (from the steepest descent perspective). | html, pdf. | ||
10/12 | Consistency of convex risk minimization, part 1. | html, pdf. | ||
10/14 | Consistency of convex risk minimization, part 2. | html, pdf. | ||
10/19 | Clustering bootcamp. | html, pdf. | ||
Generalization. | ||||
10/21 | Measure concentration bootcamp. | html, pdf. | ||
10/26 | Finite classes, primitive covering numbers. | html, pdf. | ||
10/28 | Symmetrization and Rademacher complexity. | html, pdf. | ||
11/2 | Rademacher complexity properties: lipschitz losses and finite classes. | html, pdf. | hw\(2\) out: tex, pdf. | |
11/4 | Rademacher complexity properties: VC dimension and margins. | html, pdf. | ||
11/9 | Covering and Rademacher bounds for neural networks. | html, pdf. | ||
11/11 | VC dimension of linear threshold networks. | html, pdf. | ||
11/16 | VC dimension of ReLU networks. | html, pdf. | ||
Miscellaneous. | ||||
11/18 | Fast rates. | hw\(2\) due. hw\(3\) out: tex, pdf. | ||
11/30 | Non-convex gradient descent guarantees. | |||
12/2 | The Kolmogorov-Arnol’d Theorem. | |||
12/7 | Class cancelled. | |||
Final presentations and homework. | ||||
12/7 | Due in gradescope by \(11\)am: project writeup and slides. | |||
12/8 | Final presentations, 12-4pm, Siebel 3403. | |||
12/14 | hw\(3\) due. | |||
Omitted material | ||||
Learning despite heavy tails. | ||||
Consistency of boosting. | ||||
Sparse recovery and the LASSO. | ||||
Active learning. | ||||
Spectral methods. | ||||
Online mirror descent. | ||||
SVMs and representation: universal kernels. | ||||
Content. The project must contain a theoretical component; whether you include something else as well is up to you, but will not fundamentally affect the grade. Also, please focus on quality; if you can make something cleaner and shorter without removing information, that is preferred.
Other learning theory-ish classes. All of these courses are different, and all have good material, and there are many I neglected to include!
Lieven Vandenberghe @ UCLA. This is not a learning theory course, it’s part 3 of a long optimization course, covering material not in the standard Boyd-Vandenberghe book. The lectures links are to slides; the proofs there are incredibly clean, indeed this is my favorite resouce for many of these methods.
Textbooks and surveys. Again, there are many others, but here are a key few.
Boucheron, Bousquet, and Lugosi have two excellent surveys: one is a gentler start, whereas the other even gets to some advanced topics. I read the easier one immediately upon entering grad school (despite having taken a good learning theory course in undergrad) and really liked it.
Shai Shalev-Shwartz and Shai Ben-David have an excellent textbook from 2014: “Understanding Machine Learning: From Theory to Algorithms”.
Shai Shalev-Shwartz also has an excellent monograph covering online learning, which will sadly receive little or no coverage in this learning theory course.
Devroye, Györfi, and Lugosi have a wonderful book from 1996, “A Probabilistic Theory of Pattern Recognition”, which in addition to covering standard topics in statistical learning theory, has many nice results which rarely enter courses, for instance Stone’s original argument for the consistency of \(k\)-nn in finite-dimensional Euclidean space.
Martin Anthony and Peter Bartlett have a wonderful 1999 book, “Neural Network Learning: Theoretical Foundations”, which is again in the setting of statistical learning theory, and is the only listed reference with extensive VC dimension bounds for neural networks. (Pro-tip: the cover image is not a biological image, it is a proof…)