Matus Telgarsky's Papers

Rate of Price Discovery in Iterative Combinatorial Auctions.
Jacob Abernethy, Sébastien Lahaie, Matus Telgarsky.
[arXiv]

Conference on Economics and Computation (EC), 2016.
Certain iterative combinatorial actions are equivalent to subgradient descent on appropriate supply-demand equations, with the flexibility of the pricing model controlling convergence rates; the stochastic bidding setting is standard to stochastic optimization, but the adversarial setting justifies the use of activity rules, a constraint on the players not found in stochastic optimization and learning theory.

Benefits of Depth in Neural Networks.
Matus Telgarsky.
[arXiv] [10min video]

Conference on Learning Theory (COLT), 2016.
For every positive integer \(k\), there exist deep neural networks with \(\Theta(k^3)\) layers and nodes which can not be approximated by any shallow network with \(\leq k\) layers and \(\leq 2^k\) nodes.
See also this simplified version (as below).

Convex Risk Minimization and Conditional Probability Estimation.
Matus Telgarsky, Miroslav Dudík, Robert Schapire.
[arXiv] [short video] [poster]

Conference on Learning Theory (COLT), 2015.
Even when the parameter space is ill-behaved (infinite dimensional, minima don't exist, not bounded or regularized), risk minimization of certain standard losses still converges to a unique object; in the finite dimensional case, uniform convergence (generalization) holds for empirical risk minimization.

Scalable Nonlinear Learning with Adaptive Polynomial Expansions.
Alekh Agarwal, Alina Beygelzimer, Daniel Hsu, John Langford, Matus Telgarsky.
[arXiv]

Advances in Neural Information Processing Systems (NIPS), 2014.
SGD variant which greedily grows monomial features of increasing degree.

Tensor decompositions for learning latent variable models.
Anima Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade, Matus Telgarsky.
[arXiv] [jmlr]

Journal of Machine Learning Research (JMLR), 2014.
Analysis of tensor power iteration for tensors which have an orthogonal decomposition (plus some slight noise), and examples of latent variable models which can be written in this way (e.g., LDA).

Moment-based Uniform Deviation Bounds for \(k\)-means and Friends.
Matus Telgarsky, Sanjoy Dasgupta.
[pdf] [arXiv] [poster]

Advances in Neural Information Processing Systems (NIPS), 2013.
Generalization bounds for \(k\)-means cost and Gaussian mixture log-likelihood for unbounded parameter sets when the data has a few bounded moments (no boundedness or further modeling assumptions needed).

Boosting with the Logistic Loss is Consistent.
Matus Telgarsky.
[arXiv] [short video]

Conference on Learning Theory (COLT), 2013.
Optimization, generalization, and consistency guarantees for AdaBoost with logistic and similar losses.

Margins, Shrinkage, and Boosting.
Matus Telgarsky.
[arXiv] [video]

International Conference on Machine Learning (ICML), 2013.
AdaBoost, with a variety of losses, attains optimal margins by simply multiplying the step size with a small constant.

Agglomerative Bregman Clustering.
Matus Telgarsky, Sanjoy Dasgupta.
[pdf] [short video]

International Conference on Machine Learning (ICML), 2012.
Provides the natural algorithm, with attention to: handling degenerate clusters via smoothing, Bregman divergences for nondifferentiable convex functions, exponential families without minimality assumptions.

A Primal-Dual Convergence Analysis of Boosting.
Matus Telgarsky.
[arXiv] [jmlr]

Journal of Machine Learning Research (JMLR), 2012.
This is the extended version of the NIPS paper "The Fast Convergence of Boosting".

The Fast Convergence of Boosting.
Matus Telgarsky.
[pdf]

Advances in Neural Information Processing Systems (NIPS) 2011.
AdaBoost, with a variety of losses, minimizes its empirical risk at rate \(\mathcal O(\ln(1/\epsilon))\) when either weak learnable or possessing a minimizer, and rate \(\mathcal O(1/\epsilon)\) in general.

Hartigan's Method: \(k\)-means without Voronoi.
Matus Telgarsky, Andrea Vattani.
[pdf] [old javascript demo]

International Conference on Artificial Intelligence and Statistics (AISTATS), 2010.
Hartigan's method minimizes \(k\)-means cost point by point; it terminates when points lie within regions defined by intersections of spheres (rather than just Voronoi cells).

Signal decomposition using multiscale admixture models.
Matus Telgarsky, John Lafferty.

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2007.

Representation results & algorithms for deep feedforward networks.
Jacob Abernethy, Alex Kulesza, Matus Telgarsky.
[pdf]

Advances in Neural Information Processing Systems (NIPS) workshop on non-convex optimization, 2015.

Rate of price discovery in iterative combinatorial auctions.
Jacob Abernethy, Sébastien Lahaie, Matus Telgarsky.
[arXiv]

Conference on Auctions, Market Mechanisms and Their Applications (AMMA), 2015; currently under journal review.

Steepest descent analysis for unregularized linear prediction with strictly convex penalties.
Matus Telgarsky.
[pdf] [video]

Advances in Neural Information Processing Systems (NIPS) optimization workshop, 2011.
Adaptation of some of the boosting techniques to other optimization problems, for instance gradient descent of positive semi-definite quadratics.

Representation Benefits of Deep Feedforward Networks. [arXiv]

See also the full version (as above).

Dirichlet draws are sparse with high probability. [arXiv]

Greedy bi-criteria approximations for \(k\)-medians and \(k\)-means. (2016, with Daniel Hsu.) [arXiv]

Blackwell Approachability and Minimax Theory. (2011.) [arXiv]

Central Binomial Tail Bounds. (2009.) [arXiv]

Duality and Data Dependence in Boosting. (2013.) [pdf]

Committee: Sanjoy Dasgupta (chair/advisor), Kamalika Chaudhuri, Yoav Freund, Patrick Fitzsimmons, Philip Gill, Robert Schapire.
Contains work from NIPS 2011, JMLR 2012, and COLT 2013 as above.
The results of chapter 5 (whose proofs have some minor errors) appeared later in a vastly expanded form within "Convex risk minimization and conditional probability estimation" above.