## Essentials

• Time & Place. MW 3:30-4:45, Siebel 1214.

• Instructor. Matus Telgarsky (Office hours: Siebel 3212, W 4:45-6:00).
TA. Yucheng Chen (Office hours: Siebel 2107, M 5-6, when homework due M 5-7).

• Evaluation. Homework is 80% of your grade, project is 20% of your grade. All handed-in work must be $$\LaTeX$$-compiled, on time, and submitted through gradescope (self-enrollment code 9GEERY). Homework details and project details appear below.

• Academic integrity. Cheating in this class wastes everyone’s time, just take something else. Please see the full information below.

• Discussion and announcements. piazza, here’s the signup link.

## Schedule

• Notes are typed when possible, otherwise handwritten notes are scanned.

• Schedule for future lectures is imprecise.

 Date. Topics. Notes. Coursework. 8/28 Administrivia; perceptron. pdf. hw0 out: tex, pdf. 8/30 Perceptron; decomposition of learning problems. pdf. Representation. 9/6 Failure of linear; box apx (linear over boxes, decision trees). pdf. hw0 due! 9/11 End of box apx: boosted decision trees, branching programs, 3-layer ReLU nets. Start of poly-fit: Stone-Weierstrass! pdf. 9/13 Polynomial fit via Stone-Weierstrass: sums of exponentials, RBF kernels, 2-layer networks. pdf. 9/18 RKHS interlude. pdf. 9/20 RKHS remarks; tent maps and fractional parts. pdf. hw1 out: tex, pdf. 9/25 Depth hierarchy theorems for ReLU networks; multiplication and polynomials with ReLU networks. pdf. 9/27 Wasserstein distance, probability modeling, and GANs. pdf. hw1v2: tex, pdf, diff. Optimization. 10/2 Convexity I: sets, functions, subdifferentials, first-order conditions. pen. 10/4 Convexity II: conjugacy and duality. pen. 10/9 Gradient descent when smooth. pen. 10/11 Gradient descent when smooth, strongly convex. pen. 10/16 Gradient descent and noise. pen. 10/18 Maurey sparsification and Frank-Wolfe. pen. 10/23 Convex risk minimization and classification. pen. 10/25 Continuation; start of online learning. pen. 10/30 Online learning. pen. Generalization. 11/1 Concentration of measure. pen. 11/6 Finite classes and primitive covers. pen. 11/8 Symmetrization and Rademacher complexity. pen. hw2 out: tex, pdf. project proposal out: tex, pdf 11/13 Properties of Rademacher complexity. pen. 11/15 Classification bounds. pen. 11/27 VC dimension or linear functions and linear threshold networks. pen. hw2v3: tex, pdf, diffpdf. 11/29 VC dimension of ReLU networks. pen. 12/4 Possibly Definitely no class! 12/6 Definitely no class! 12/11 Rademacher and covering number bounds for neural networks I. pen. 12/13 Rademacher and covering number bounds for neural networks II. hw3v2 out: tex, pdf. Final presentations and homework. 12/14 Reading day: project presentations!

## Homework policies

• Homework 0 is 5% of your grade. It should be easy.
• Groups.
• Homework 0 must be completed individually.
• For homeworks 1-3, everyone must submit an individual, unique handin on gradescope. You may discuss with up to 3 people; state their NetIDs on page 1 of the handin.
• Submission.
• Homeworks must pass through a $$\LaTeX$$ compiler.
• I recommend lshort as a $$\LaTeX$$ tutorial and rudimentary reference.
• Electronic submission only through gradescope (self-enrollment code 9GEERY).
• No late homework. In exchange, homework is graded promptly (within 1 week).
• Homework is due at 3:30pm on the day it is due.
• There is absolutely no reason to cheat in an optional grad class; please do not waste your time or my time and just drop instead.
• I prefer if you do not use outside resources. If you do, you must cite them, and still you must state everything in your own words.
• If we find possible cheating cases, we will immediately submit them to the department review board without fretting over it.

## Project policies

• Groups. You may work individually, or in pairs.

• Content. The project must contain a theoretical component; whether you include something else as well is up to you, but will not fundamentally affect the grade. Also, please focus on quality; if you can make something cleaner and shorter without removing information, that is preferred.

• Project themes. Here are some possible projects (concrete ideas will be sprinkled throughout the course):
• A genuinely new, non-trivial theoretical result.
• A clean-up of some complicated, confusing result; e.g., replacing a 20 page analysis with a 2 page analysis (in a way that isn’t obvious given other recent papers).
• A survey which aims to unify and clarify relationships between the works it considers.
• Submission. You will turn in both a written report and present at the end of the semester.
• The report must be at least 2 pages. This is vastly shorter than the homework; therefore, your submission should be high quality. A 2-page submission which consists of filler is not good.
• Presentations consist of exactly 2 slides and take no more than 5 minutes; the first summarizes the topic, the second says something interesting you encountered.
• Projects are handed in on gradescope just like everything else.
• The idea behind the submission is anti-busywork; it’s short, but should be good!
• Milestones. We’ll schedule meetings some time in October so that I can sanity check all projects.

## Resources

Other learning theory-ish classes. All of these courses are different, and all have good material, and there are many I neglected to include!

• Lieven Vandenberghe @ UCLA. This is not a learning theory course, it’s part 3 of a long optimization course, covering material not in the standard Boyd-Vandenberghe book. The lectures links are to slides; the proofs there are incredibly clean, indeed this is my favorite resouce for many of these methods.

Textbooks and surveys. Again, there are many others, but here are a key few.