Rigorous Systems Research Group (RSRG) Seminar
The mysterious ability of gradient-based optimization algorithms to solve the non-convex neural network training problem is one of the many unexplained puzzles behind the success of deep learning in various applications. A promising direction to explain this phenomenon is to study how initialization and overparametrization affect convergence of training algorithms. In this talk, we analyze the convergence of gradient flow on a multi-layer linear model with a loss function of the form $f(W_1W_2\cdots W_L)$. We show that when $f$ satisfies the gradient dominance property, proper weight initialization leads to exponential convergence of the gradient flow to a global minimum of the loss. Moreover, the convergence rate depends on two trajectory-specific quantities that are controlled by the weight initialization: the \emph{imbalance matrices}, which measure the difference between the weights of adjacent layers, and the least singular value of the \emph{weight product} $W=W_1W_2\cdots W_L$. Our analysis provides improved rate bounds for several multi-layer network models studied in the literature, leading to novel characterizations of the effect of weight imbalance on the rate of convergence. Our results apply to most regression losses and extend to classification ones.
Bio: Hancheng Min is a fourth-year PhD student in Electrical and Computer Engineering at Johns Hopkins University, advised by Enrique Mallada. His research interests includes deep learning theory, reinforcement learning, and networked dynamical systems. His work has been supported through the MINDS Fellowship.