Special Seminar in Applied and Computational Mathematics
A major challenge in modern data analysis is to reliably and automatically discover hidden structure in data with little or no human intervention. Rather frustratingly, many mathematical abstractions of these problems are provably intractable in their most general form. Nevertheless, it may be possible to overcome these hardness barriers by focusing on realistic cases that rule out intractable instances. In this talk we focus on two concrete problems of this kind: subspace clustering and phase retrieval. We show that careful reformulations of these problems lead to tractable optimization methods that are provably accurate and rather effective on real data sets.
The first part of the talk focuses on extracting subspace structures from data.Subspace Clustering is the problem of finding a multi-subspace representation that best fits a collection of points taken from a high-dimensional space. As with most clustering problems, popular techniques for subspace clustering are often difficult to analyze theoretically and/or terminate in local optima of non-convex functions--these problems are only exacerbated in the presence of noise and missing data. We introduce a collection of subspace clustering algorithms, which are tractable and provably robust to various forms of data imperfections. We further illustrate our methods with numerical experiments on a wide variety of data segmentation problems.
In the second part of the talk we consider the question of recovering the seemingly hidden phase of an object from intensity-only measurements, a problem which naturally appears in X-ray crystallography and related disciplines. We study a physically realistic setup where one can modulate the signal of interest and then collect the intensity of its diffraction pattern. We show that a non-convex formulation of the problem recovers the phase information exactly from a number of near minimal random modulations. To solve this non-convex problem, we develop an iterative algorithm that combines a careful initialization together with a novel update that escapes all local minima and provably converges to the global optimum. Our proposed scheme is near optimal in terms of usage of computational and data resources.