Machine Learning & Scientific Computing Series
Classic arguments from statistical learning theory, often formulated in terms of bias-variance tradeoff, suggest that models with high capacity should overfit, and therefore generalize poorly on unseen data. Deep neural networks (DNNs) appear to break this basic rule of statistics, because they perform best in the overparameterized regime. One way of formulating this conundrum is in terms of inductive bias: DNNs are highly expressive, and so can represent almost any function that fits a training data set. Why then are they biased towards functions that generalize well? The source of this inductive bias must arise from an interplay between network architecture, training algorithms, and structure in the data.
To disentangle these three components, we apply a Bayesian picture, based on the functions expressed by a DNN, to supervised learning for some simple classification problems, including Boolean functions, MNIST and CIFAR10. We show that the DNN prior over functions is determined by the architecture, and is biased towards
https://caltech.zoom.us/rec/share/8KJQ21y5kikJqrHDqSwQH3Fl8ra7sx7uV8nh4x3lRUUtyEbWvD8eO_56Hboj7eaT.F2mMqEy0VbeLtlea Passcode: K3GN.5+i