Speaker
Description
Neural networks model complex phenomena within and beyond physics by leveraging their universal estimation capacity. However, this universality interferes with generalizability when combined with optimization-based training. This tension is exemplified by the prevalence of overfitting, where naive optimization drives networks into overspecified representations that deviate from underlying phenomena. Phenomena can be better represented by collections of networks, but many so-called “ensemble”-based approaches are still undermined by the intrinsic difficulty of reconciling optimization-based training with noisy data. Nonetheless, observed improvements from “ensemble”-based methods suggest that less-haphazard ensemble generation approaches that leverage physical insight could improve neural network training.
In this talk, we show that physics-based ensemble methods outperform optimization-based training. Rather than seeking optimal networks, our sufficient-training approach generates “good enough” weights and biases, which, paradoxically, outperform leading optimization-based approaches. We resolve this apparent paradox using information theory to show that sufficient training provides a minimally-biased representation of the underlying phenomenon. Our results show that straightforward, physics-based approaches can supplant optimization for training transformers, and feedforward and convolutional neural networks, by offering superior training performance and powerful insights for neural networks in physics applications and beyond.
| Keyword-1 | Neural networks |
|---|---|
| Keyword-2 | Statistical Physics |