Time & Place. MW 3:304:45, Siebel 1214.
Instructor. Matus Telgarsky (Office hours: Siebel 3212, W 4:456:00).
TA. Yucheng Chen (Office hours: Siebel 2107, M 56, when homework due M 57).
Evaluation. Homework is 80% of your grade, project is 20% of your grade. All handedin work must be \(\LaTeX\)compiled, on time, and submitted through gradescope (selfenrollment code 9GEERY). Homework details and project details appear below.
Academic integrity. Cheating in this class wastes everyone’s time, just take something else. Please see the full information below.
Discussion and announcements. piazza, here’s the signup link.
Feedback. Please reach out with any comments/concerns!
Notes are typed when possible, otherwise handwritten notes are scanned.
Schedule for future lectures is imprecise.
Date.  Topics.  Notes.  Coursework. 
8/28  Administrivia; perceptron.  pdf.  hw0 out: tex, pdf. 
8/30  Perceptron; decomposition of learning problems.  pdf.  
Representation.  
9/6  Failure of linear; box apx (linear over boxes, decision trees).  pdf.  hw0 due! 
9/11  End of box apx: boosted decision trees, branching programs, 3layer ReLU nets. Start of polyfit: StoneWeierstrass!  pdf.  
9/13  Polynomial fit via StoneWeierstrass: sums of exponentials, RBF kernels, 2layer networks.  pdf.  
9/18  RKHS interlude.  pdf.  
9/20  RKHS remarks; tent maps and fractional parts.  pdf.  hw1 out: tex, pdf. 
9/25  Depth hierarchy theorems for ReLU networks; multiplication and polynomials with ReLU networks.  pdf.  
9/27  Wasserstein distance, probability modeling, and GANs.  pdf.  hw1v2: tex, pdf, diff. 
Optimization.  
10/2  Convexity I: sets, functions, subdifferentials, firstorder conditions.  pen.  
10/4  Convexity II: conjugacy and duality.  pen.  
10/9  Gradient descent when smooth.  pen.  
10/11  Gradient descent when smooth, strongly convex.  pen.  
10/16  Gradient descent and noise.  pen.  
10/18  Maurey sparsification and FrankWolfe.  pen.  
10/23  Convex risk minimization and classification.  pen.  
10/25  Continuation; start of online learning.  pen.  
10/30  Online learning.  pen.  
Generalization.  
11/1  Concentration of measure.  pen.  
11/6  Finite classes and primitive covers.  pen.  
11/8  Symmetrization and Rademacher complexity.  pen.  hw2 out: tex, pdf. project proposal out: tex, pdf 
11/13  Properties of Rademacher complexity.  pen.  
11/15  Classification bounds.  pen.  
11/27  VC dimension or linear functions and linear threshold networks.  pen.  hw2v3: tex, pdf, diffpdf. 
11/29  VC dimension of ReLU networks.  pen.  
12/4 


12/6  Definitely no class!  
12/11  Rademacher and covering number bounds for neural networks I.  pen.  
12/13  Rademacher and covering number bounds for neural networks II.  hw3v2 out: tex, pdf.  
Final presentations and homework.  
12/14  Reading day: project presentations!  
Groups. You may work individually, or in pairs.
Content. The project must contain a theoretical component; whether you include something else as well is up to you, but will not fundamentally affect the grade. Also, please focus on quality; if you can make something cleaner and shorter without removing information, that is preferred.
Milestones. We’ll schedule meetings some time in October so that I can sanity check all projects.
Other learning theoryish classes. All of these courses are different, and all have good material, and there are many I neglected to include!
Lieven Vandenberghe @ UCLA. This is not a learning theory course, it’s part 3 of a long optimization course, covering material not in the standard BoydVandenberghe book. The lectures links are to slides; the proofs there are incredibly clean, indeed this is my favorite resouce for many of these methods.
Textbooks and surveys. Again, there are many others, but here are a key few.
Boucheron, Bousquet, and Lugosi have two excellent surveys: one is a gentler start, whereas the other even gets to some advanced topics. I read the easier one immediately upon entering grad school (despite having taken a good learning theory course in undergrad) and really liked it.
Shai ShalevShwartz and Shai BenDavid have an excellent textbook from 2014: “Understanding Machine Learning: From Theory to Algorithms”.
Shai ShalevShwartz also has an excellent monograph covering online learning, which will sadly receive little or no coverage in this learning theory course.
Devroye, Györfi, and Lugosi have a wonderful book from 1996, “A Probabilistic Theory of Pattern Recognition”, which in addition to covering standard topics in statistical learning theory, has many nice results which rarely enter courses, for instance Stone’s original argument for the consistency of \(k\)nn in finitedimensional Euclidean space.
Martin Anthony and Peter Bartlett have a wonderful 1999 book, “Neural Network Learning: Theoretical Foundations”, which is again in the setting of statistical learning theory, and is the only listed reference with extensive VC dimension bounds for neural networks. (Protip: the cover image is not a biological image, it is a proof…)