link

February 24, Tuesday
12:00 – 14:00

More Data Less Work: Runtime as a Monotonically Decreasing Function of Data Set Size
Computer Science seminar
Lecturer : Nati Srebro
Lecturer homepage : http://ttic.uchicago.edu/~nati/
Affiliation : Toyota Technological Institute and University of Chicago
Location : 202/37
Host : Dr. Michel Elkin
We are used to studing the runtime of algorithms as an increasing function of the data set size, and are happy when this increase is not so bad (e.g. when the runtime increases linearly, or even polynomiall, with the data set size). Traditional runtime analyzis of learning is also viewed this way, and studies how training runtime increases as more data is available. However, considering the true objective of training, which is to obtain a good predictor, I will argue that training runtime should actually be studied as a *decreasing* function of training set size. Focusing on training Support Vector Machines (SVMs), I will then present both theoretical and empirical results demonstrating how a simple stochastic subgradient descent approach indeed displays such monotonic decreasing behavior. I will also discuss a similar phenomena in the context of Gaussian mixture clustering, where it appears that excess data turns the problem from computationally intractable to computationally tractable.

Joint work with Shai Shalev-Shwartz, Karthik Sridharan, Yoram Singer, Greg Shakhnarovich and Sam Roweis