Why Neural Networks Follow a Common Learning Path

Summary: Researchers at the University of Pennsylvania have discovered that neural networks trained for image classification follow a common, low-dimensional learning trajectory—regardless of architecture, size, or training method. Instead of relying on vastly different internal strategies, networks tend to discover the same essential features (for example, ears, eyes, or fur patterns) and move along a narrow manifold in probability space from initial ignorance to accurate classification.

This insight, derived from information-geometric analysis of many networks, suggests opportunities to design far more efficient training algorithms. By understanding the shared learning path, developers may be able to reduce the massive computational burden commonly associated with training state-of-the-art AI models, potentially enabling cheaper and faster deployment of image-classification systems across many fields.

Key Facts:

  1. Shared Learning Trajectory: Different neural networks—convolutional, residual, transformer-based, and multilayer perceptrons—tend to traverse the same low-dimensional path while learning image classification.
  2. Potential Efficiency Gains: Revealing a common manifold points to the possibility of training algorithms that reach high accuracy with substantially less computation.
  3. Information Geometry Approach: Treating networks as probabilistic models allowed an apples-to-apples comparison and uncovered the manifold structure in the prediction space.

Source: University of Pennsylvania

Penn engineers report an unexpected regularity in how deep networks learn, offering clues about why current methods work so well and how they might be improved.

Neural networks—software systems inspired by biological neurons—learn by iteratively adjusting millions or billions of parameters so they can make accurate predictions on new data. Today these models are widely used in medicine, astronomy, robotics, and many other areas to identify patterns and make decisions from images and other inputs.

This shows a lit up network.
Discovering an algorithm that will consistently find the path needed to train a neural network to classify images using just a handful of inputs is an unresolved challenge. Credit: Neuroscience News

In a paper published in the Proceedings of the National Academy of Sciences (PNAS), Pratik Chaudhari, Assistant Professor in Electrical and Systems Engineering and a core faculty member of the GRASP Lab, together with James Sethna of Cornell University and lead author Jialin Mao (a doctoral student at Penn), show that networks trained on image classification tasks move along essentially the same narrow manifold in the high-dimensional prediction space.

“Imagine a task of distinguishing cats and dogs,” Chaudhari explains. “Different people might focus on whiskers, ear shapes, or fur markings. You would expect different networks to use pixels in different ways and some to perform better than others. Yet we see a strong commonality: the networks converge by extracting the same low-dimensional features.”

The team analyzed hundreds of thousands of models spanning many architectures, training recipes, initialization schemes, and data-augmentation and regularization methods. Using tools from information geometry—an area that combines geometry with statistics—they treated each trained model as a probability distribution to compare trajectories rigorously. The result: the output probabilities clustered along thin manifolds in extremely large spaces, and distinct networks followed similar trajectories along those manifolds.

Chaudhari offers two explanations for why this happens. First, natural images occupy a very small, structured subset of all possible pixel arrangements: everyday photographs of objects are far from random noise. Second, labels used for classification group objects into broad, human-defined categories. Distinguishing those categories often depends on a few meaningful features (for instance, the presence of ears or a particular eye shape), so networks naturally discover and rely on low-dimensional attributes rather than every pixel.

These findings imply that many details of network design—exact layer counts, precise hyperparameters, or some optimization choices—may matter less than previously thought, at least for image classification. Larger models often traverse the same manifold as smaller ones but do so more quickly, and models initialized in very different parts of the prediction space still converge along similar routes.

The practical implication is significant: if researchers can learn to identify or directly exploit this shared low-dimensional path, they may be able to develop training procedures that reach comparable accuracy with far fewer resources. “This is the billion-dollar question,” Chaudhari says. “Can we train neural networks cheaply? Our results suggest that might be possible, but the algorithmic breakthrough to do so consistently is still unknown.”

Funding: The study was conducted at the University of Pennsylvania School of Engineering and Applied Science and Cornell University, supported by grants from the National Science Foundation, National Institutes of Health, the Office of Naval Research, the Eric and Wendy Schmidt AI in Science Postdoctoral Fellowship, and cloud computing credits from Amazon Web Services.

Other co-authors include Rahul Ramesh and Rubing Yang at the University of Pennsylvania; Itay Griniasty and Han Kheng Teoh at Cornell University; and Mark K. Transtrum at Brigham Young University.

About this AI research news

Author: Ian Scheffler
Source: University of Pennsylvania
Contact: Ian Scheffler – University of Pennsylvania
Image: The image is credited to Neuroscience News

Original Research: Closed access. “The training process of many deep networks explores the same low-dimensional manifold” by Pratik Chaudhari et al. PNAS


Abstract

The training process of many deep networks explores the same low-dimensional manifold

Using information-geometric techniques to analyze prediction trajectories, the authors show that training deep networks explores an effectively low-dimensional manifold in prediction space. Networks with diverse architectures, sizes, optimization methods, regularization, data augmentation, and initializations lie on the same manifold. While different architectures follow distinguishable trajectories, other factors have minimal influence: larger networks often train along a similar manifold to smaller ones, only faster, and networks initialized in different regions of prediction space converge along the same low-dimensional route.