Speaker
Emmanuel Abbé
(EPFL)
Description
We consider the problem of learning sparse functions with uniform inputs on the Boolean hypercube. It is shown that algorithms based on the training of 2-layer mean-field neural networks with stochastic gradient descent can “optimally” learn such functions “iff" the function has a hierarchical property called the staircase property, which consists of having chains in the Fourier coefficients of increasing degree.
This implies two separation results: (i) such neural networks outperform kernel methods, (ii) SQ algorithms outperform such neural networks.
Based on joint works with E. Boix (MIT) and T. Misiakiewicz (Stanford).