## March 14, 2022

### Machine learning in and out of equilibrium

Michael Hinczewski, Shishir Adhikari, Alkan Kabakcioglu, Alexander Strang, and Deniz Yuret. March 2022. Bulletin of the American Physical Society.

Abstract:The algorithms used to train neural networks, like stochastic gradient descent (SGD), have close parallels to natural processes that navigate a high-dimensional parameter space—for example protein folding or evolution. Our study uses a Fokker-Planck approach, adapted from statistical physics, to explore these parallels in a single, unified framework. We focus in particular on the stationary state of the system in the long-time limit. In contrast to its biophysical analogues, conventional SGD leads to a nonequilibrium stationary state exhibiting persistent currents in the space of network parameters. The effective loss landscape that determines the shape of this stationary distribution sensitively depends on training details, i.e. the choice to minibatch with or without replacement. We also demonstrate that the state satisfies the integral fluctuation theorem, a nonequilibrium generalization of the second law of thermodynamics. Finally, we introduce an alternative thermalized'' SGD procedure, designed to achieve an equilibrium stationary state. Deployed as a secondary training step, after conventional SGD has converged, thermalization is an efficient method to implement Bayesian machine learning, allowing us to estimate the posterior distribution of network predictions.