AI Reveals How Languages Evolve to Aid Learning

Summary: A new study identifies the architectural and evolutionary principles that shape how children and artificial neural networks acquire language. Bridging cognitive linguistics and deep learning, the research demonstrates how iterated learning—where language is transmitted across generations—causes language to grow more structured, making it easier to learn.

Using a deep linear neural network designed to mirror a child’s staged learning, the investigators show that structural regularities naturally emerge from the tension between communication demands and imperfect transmission across learners.

Key Facts

The Iterated Evolution Paradigm: Iterated learning treats language as an evolving system. As language passes from one generation to the next, pressures to communicate efficiently and to be learnable reshape it toward greater structure and regularity.
The Child-Brain Simulation: The research team built deep linear networks with learning dynamics analogous to those of young children and exposed successive network generations to datasets resembling natural language input.
Error-Driven Refinement: Children often overgeneralize early rules (for example, assuming all winged birds fly before encountering penguins). Those systematic, non-random errors influence what is transmitted: predictable, rule-based portions of language persist, while irregular, difficult-to-learn aspects are gradually lost.
Depth Matters: The experiments demonstrate that iterated learning produces compositional structure only when networks have sufficient depth—multiple processing layers that allow hierarchical organization. Shallow networks with few layers fail to recover the structured regularities that support learnability.
Relevance to Modern AI: The same principles underlie the emergent structure observed in large generative AI systems. Both the architecture of the learner and the complexity of the environment determine how effectively language patterns are absorbed and passed on.
Cross-Disciplinary Insight: Lead author Dr. Devon Jarvis emphasizes that combining iterated learning theory with deep linear network analysis reveals why language evolves to be learnable—specifically, because children learn in stages and prefer reusing reliable patterns.

Source: University of the Witwatersrand

New research at the University of the Witwatersrand in South Africa clarifies connections between human language development and the behaviour of large-scale AI language models.

Culture and intergenerational transmission are central. Iterated learning describes a process where language changes over generations—whether human or computational—driven by pressures to be both expressive and learnable.

“We constructed a computational model with learning properties similar to a child’s and compared its behaviour to observations from child development,” explains lead author Dr. Devon Jarvis, Lecturer in the School of Computer Science and Applied Mathematics and Fellow at the Wits Machine Intelligence and Neural Discovery Institute. “We then trained successive versions of that model on data with the statistical properties of human language and observed how the system’s ‘language’ evolved.”

Their paper, titled Compositionality and Systematicity Emerge from Iterated Learning in Deep Linear Networks, appears in the journal Proceedings of the National Academy of Sciences (PNAS).

Childhood learning as the engine of structure

Children acquire concepts in hierarchical stages: they begin with broad distinctions—plants versus animals—then learn finer categories and exceptions. Early generalizations can lead to predictable errors that later refine understanding. For example, a child may initially infer that all birds fly but then adjust the rule after encountering penguins. These kinds of systematic mistakes act as filters during transmission: they emphasize reusable rules and trim away irregular, hard-to-learn details.

“When parents pass language to children and those children later become parents, transmission inevitably introduces errors,” Jarvis says. “Because many of those errors come from overgeneralization rather than randomness, they favor retention of simple, compositional patterns while the messier parts of language fade.”

Network architecture shapes what can be learned

The study uses deep linear networks—mathematical models that retain analytic clarity while capturing multi-layer processing—to probe the neural basis of iterated learning. The results show that depth is essential: only sufficiently deep architectures can discover and preserve compositional structure through iterated transmission. Shallow networks lack the hierarchical capacity to encode the same regularities and therefore fail to generalize in the same way.

This insight links developmental cognitive science with contemporary AI advances: large generative models rely on layered architectures and vast data to produce their emergent capabilities. The study suggests these models and human learners are shaped by similar constraints and pressures.

“Elements of this framework—deep linear networks and iterated learning—have existed separately in different literatures,” Jarvis notes. “Combining them shows a clear mechanism: language evolves to be learnable because children learn in stages and preferentially reuse structured information rather than memorizing exceptions.”

Key Questions Answered:

Q: Why do children’s grammatical mistakes help language become easier to learn over generations?

A: Those mistakes are systematic, not random. Overgeneralizations reflect an attempt to impose order on data. As language is transmitted across generations, irregular, hard-to-learn features tend to be forgotten while rule-based, reusable structures are retained, making the language progressively easier to learn.

Q: How do shallow and deep networks differ when learning language?

A: The main difference is hierarchical processing capacity. Deep networks, with multiple layers, can form nested representations and extract compositional structure; shallow networks lack this depth and therefore miss hidden regularities needed to transmit structured language effectively.

Q: How does this research relate to the surge in generative AI?

A: It shows that the same cognitive pressures—layered processing and large, structured input—drive emergent structure in both human language and modern AI. Even a simple deep linear model can reproduce the pathway by which compositional language arises through iterated learning.

Editorial Notes:

This article was edited by a Neuroscience News editor.
The journal article was reviewed in full.
Additional context was added by the editorial staff.

About this AI and language learning research news

Author: Shirona Patel
Source: University of the Witwatersrand
Contact: Shirona Patel – University of the Witwatersrand
Image: The image is credited to Neuroscience News

Original Research: Open access. “Compositionality and systematicity emerge from iterated learning in deep linear networks” by Devon Jarvis, Richard Klein, Benjamin Rosman, and Andrew M. Saxe. PNAS
DOI: 10.1073/pnas.2509739123

Abstract

Compositionality and systematicity emerge from iterated learning in deep linear networks

Humans systematically generalize by recombining aspects of prior experience, and language exemplifies this capacity. Iterated learning—where each generation learns from the outputs of the previous—has been shown to refine communicative systems toward compositional structure.

This study provides a theoretical examination of how compositional language and systematic generalization arise in simple neural networks. Building on prior analyses of linear networks, the authors derive exact learning dynamics across generations for both shallow and deep models and refine the notion of systematicity to clarify the benefits and limits of iterated learning.

Results indicate that iterated learning promotes systematic generalization by uncovering compositional substructures in output labels. Multiple generations are necessary for compositional patterns to emerge robustly; under certain conditions, this multi-generation process can outperform single-generation training with optimal stopping. However, to ignore irrelevant input features and generalize robustly, networks may require very large datasets. The authors therefore introduce the concept of “weak systematic generalization” to capture how scale contributes to emergent systematicity.