Rigorous Systems Research Group (RSRG) Seminar

Thursday February 28, 2019 12:00 PM

Understanding the Bias and Data Efficiency of Uncertainty Sampling

Speaker: Steve Mussmann, Computer Science, Stanford University
Location: Annenberg 213

Uncertainty Sampling is perhaps the most common active learning algorithm used in practice. In this talk, we describe two phenomena about Uncertainty Sampling and the corresponding theoretical explanations that we developed to explain these phenomena. First, Uncertainty Sampling with logistic regression on different datasets yields a wide range of data efficiencies (the factor reduction in the number of samples to reach the same error). We find both empirically and theoretically that this variation in data efficiency is inversely related to the error rate of the final classifier. Second, Uncertainty Sampling on a convex loss can converge to a wide variety of final error rates, and sometimes even lower than the error of random sampling, given infinite samples for a given dataset. We explain this phenomenon by finding that Uncertainty Sampling is implicitly optimizing the nonconvex zero-one loss.

Series Rigorous Systems Research Group (RSRG) Seminar Series

For more information visit: http://cms.caltech.edu/seminars