Machine Learning and Scientific Computing Series
The empirical success of artificial neural networks over the past decade has proved challenging to our understanding of statistical learning processes. In this talk, the focus is on a non-convex optimization problem originating from the training of a neural network (student) using a planted model (teacher). This "student-teacher" model has frequently been used in theoretical investigations but is still hard to analyze analytically. We show that in certain cases we can use underlying symmetry to give precise analytic results including the identification of many families of spurious minima, representation of critical points of these families by computable power series in 1/ √ k (k is the number of neurons), precise analytic descriptions of the associated Hessian spectrum for arbitrarily large k, and the asymptotics of the decay of the loss function at spurious minima. Using methods of forced symmetry breaking, it also becomes possible to describe, for example, the deformations of landscape geometry that can lead to formation or annihilation of spurious minima. The work we report on here is part of a collaboration with Yossi Arjevani (Hebrew University, Israel).