How Language Evolves for Learnability in AI and Humans

Language, as we understand it today, is not merely a static, fixed system handed down across generations through abstract cultural tradition. Instead, it is a dynamic, evolutionary force that actively reshapes itself over time—a phenomenon known in cognitive science as "iterated learning." In iterated learning, information is passed from one learner to the next, with each individual interpreting and reproducing what they've learned through their own cognitive filters. This process has profound implications: rather than language adapting solely to the communicative needs of adults, it also adapts to the learning constraints and biases of children.

As discussed in our recent analysis on AI Over-Memorization, understanding these learning dynamics is crucial.

In this article, I will explore the crucial intersection between children's early language development and the training dynamics of deep neural networks. By examining recent research conducted by Dr. Devon Jarvis of the Wits Machine Intelligence and Neural Discovery (MIND) Institute, we can begin to see how structural regularities in language are not accidental but functional, evolved features. The study demonstrates that language structure emerges naturally from the transmission process itself—a powerful insight with implications spanning cognitive science, linguistics, and artificial intelligence.

At the heart of this research is the principle that language is optimized for learnability. This means that over evolutionary and developmental time, linguistic systems have been shaped by a selection pressure: structures that are easier for learners to acquire are more likely to be transmitted successfully. The study models this process computationally, showing how linguistic structure can arise purely from the constraints of iterative transmission across successive "generations" of learners. This approach provides a rigorous framework for understanding why human language exhibits certain regularities—why we prefer hierarchical structures, why meaning often maps systematically to form, and why children can acquire such complex systems so quickly.

The key realization is that learnability is not just a convenient property; it is an evolutionary necessity. Without mechanisms for efficient learning, language would remain inaccessible to new generations, and communication systems would collapse. This principle explains why children's errors—often dismissed as simple mistakes—are actually structured, predictable deviations that reveal the underlying learning strategies both humans and artificial systems employ. When we trace these patterns across domains, a unified picture emerges: the same mathematical principles govern how children learn their first words and how neural networks learn to generate coherent text.

This article will unpack these principles step by step. We will examine how the researchers modeled children's learning dynamics in artificial neural networks, what architectural features proved essential for successful language evolution, and how "non-arbitrary mistakes" serve as critical signposts guiding the emergence of structured communication. Throughout, we'll draw connections between cognitive science and modern AI research, revealing how insights from one domain can illuminate challenges in the other.

The stakes are high: if language evolves for learnability, then improving AI systems may require understanding how humans learned to speak in the first place. The work of Dr. Jarvis and the MIND Institute offers not just a computational model but a framework for thinking about intelligence itself—one where structure emerges not from top-down design but from the bottom-up pressures of transmission, error, and adaptation.

Introduction: The Iterated Learning Paradigm

Methodology: Modeling the Child Brain

To investigate how language structure emerges, Dr. Jarvis and his team at the Wits MIND Institute developed a controlled computational framework that simulates iterated learning across artificial neural networks. The research, formally titled "Compositionality and Systematicity Emerge from Iterated Learning in Deep Linear Networks" and published in the Proceedings of the National Academy of Sciences (PNAS), employed a sophisticated simulation approach that mirrored key developmental characteristics observed in human children.

The methodology rested on a simple yet powerful premise: if language evolves through iterated learning in human cognition, can we recreate this process in silico using deep learning architectures? The answer, as demonstrated by the study, is a resounding yes—provided that the learner possesses sufficient architectural complexity.

The Iterated Learning Pipeline

The researchers designed a pipeline in which successive generations of neural networks learned to process and reproduce linguistic patterns. Each "generation" consisted of a network trained on data that had been passed through a previous learner, introducing controlled errors and variations mimicking the imperfections inherent in human transmission. These errors were not random noise but specific, structured perturbations designed to reflect the types of mistakes children make when acquiring language—for instance, overgeneralizing grammatical rules or misremembering phonological details.

The process unfolded in distinct phases:

Founder Generation: The initial network was pretrained on raw linguistic data drawn from representative corpora.
Transmission Phase: The outputs of each generation were passed to the next as training data, with every transmission step incorporating controlled levels of stochasticity.
Evaluation Phase: Each new generation was tested on compositional generalization tasks—measuring whether the network could apply learned patterns to novel combinations of familiar elements.
Selection Phase: Networks demonstrating greater systematicity and compositional success were more likely to become the basis for subsequent generations.

This iterative loop allowed the researchers to observe, in real time, how linguistic structure emerged from noise and imperfection.

Architectural Design: Deep Linear Networks

A critical methodological choice was the use of deep linear networks rather than complex nonlinear architectures. Linear networks—those composed exclusively of matrix multiplications without activation functions—are particularly instructive because they reveal the fundamental capabilities of depth itself, independent of nonlinearities that can obscure underlying mechanisms.

The architecture featured multiple layers stacked in sequence, each performing a linear transformation on the input. The depth—the number of layers—was systematically varied across experimental conditions, enabling the researchers to isolate its effect on learning outcomes.

Crucially, the networks were trained using standard optimization procedures: gradient descent with backpropagation. The training objective was to reconstruct linguistic sequences from corrupted inputs, forcing each network to discover underlying compositional rules rather than memorizing surface forms.

This design allowed the team to pose a precise question: does iterated learning produce structured language in deep linear networks? The answer, revealed through careful experimentation, would have profound implications for our understanding of both human and artificial learning.

The methodology's strength lies in its control. By simulating iterated learning in a precise, reproducible environment, the researchers eliminated confounding variables that plague field studies of child language acquisition. They could manipulate precisely which aspects of the learning process—architectural depth, error profiles, transmission fidelity—were varied and measure their specific effects on linguistic structure.

Methodology: Modeling the Child Brain

Findings: The Power of Structural Depth

The central discovery of the study was both surprising and decisive: iterated learning produces structured, compositional language only when learners possess sufficient architectural depth. Shallow networks—those with only one or two layers—failed completely to develop systematic linguistic representations, regardless of training duration or data quantity. Deep linear networks, however, reliably evolved compositional communication systems with clear hierarchical structure.

The Depth Threshold

The researchers identified a critical threshold: networks required at least three to five processing layers before compositional structure emerged consistently. Below this threshold, learning remained local and shallow; above it, global patterns and systematic generalization appeared spontaneously. This finding underscores a fundamental insight: learnability is not just about having a learning mechanism; it is about having the right kind of learning machinery.

This depth threshold matters because it determines whether a learner can build internal hierarchical representations. In shallow networks, information passes through in a single pass, limiting the system to surface-level statistical patterns. Deep networks, by contrast, can build progressively more abstract representations at each layer—first detecting basic phonetic features, then combining them into morphemes, then building words and phrases.

The transition was not gradual; it was a sharp phase change. Once depth crossed the threshold, compositional scores increased dramatically over just a few generations. This suggests that structural emergence is not linear accumulation but a qualitative reorganization triggered by sufficient representational capacity.

Compositionality and Systematicity

The structured languages that emerged exhibited two key properties:

Compositionality: The meaning of complex expressions could be derived from the meanings of their parts and the rules combining them. For instance, if the network learned that [word A] meant "red" and [word B] meant "circle," it could correctly interpret a novel combination [word A + word B] as "red circle."

Systematicity: The network applied learned rules consistently across new contexts. If it learned one grammatical transformation, it applied the same rule to other inputs rather than learning each case in isolation.

These properties did not arise from explicit programming; they emerged naturally from the iterative transmission process. The network learned that systematic, compositional languages were more robust to transmission errors—mistakes in one part of an utterance did not cause complete communication failure if the rest followed systematic rules.

Errors as Features, Not Bugs

The inclusion of controlled errors during transmission was not a bug but a feature. When the researchers removed error injection, languages failed to evolve structure at all. Errors created selective pressure for robustness: only systems that could recover from imperfections would survive transmission.

This finding directly mirrors child language acquisition. Children do not learn language from perfect, error-free input but from noisy, incomplete, sometimes contradictory data. Their ability to generalize systematically despite this noise is precisely what makes human language learnable.

Quantitative Measures

The researchers measured structural emergence using several quantitative metrics:

Compositionality Score: Based on mutual information between semantic features and linguistic forms
Systematicity Score: Measured consistency of rule application across novel test cases
Transmission Fidelity: How much information survived each generation
Learnability Index: The speed at which new networks could acquire the emerging language

All metrics increased significantly only in deep network conditions, confirming that depth is necessary—not sufficient, but necessary—for compositional language evolution.

The implications are clear: if we want artificial systems to develop structured, learnable language, they must have sufficient representational depth. This is not a limitation but a design principle: architectural maturity enables structural emergence.

Cognitive Linguistics and Developmental Mistakes

The cognitive linguistics dimension of this research provides perhaps its most compelling validation: the artificial learning dynamics mirror exactly the error patterns and learning trajectories observed in real children. This convergence across biological and artificial systems suggests that we are uncovering universal principles of structured learning.

The Penguin Paradox and Non-Arbitrary Errors

One of the most illuminating examples comes from child language development: the famous "penguin confusion." Young children learn that birds have wings and that wings enable flight. When they first encounter a penguin, their immediate inference is that it should fly—a clear error by adult standards. Yet this error is not arbitrary; it follows systematically from their developing conceptual framework.

The penguin mistake reveals a profound truth about learning: children do not memorize facts in isolation. They build hierarchical models, and their errors occur at the boundaries where new information must be integrated with existing structure. This is exactly what Dr. Jarvis's research demonstrates in artificial systems: errors are not noise to be eliminated but signals that reveal the learner's current representational structure.

Overgeneralization and Rule Formation

Children consistently overgeneralize grammatical rules. They say "mouses" instead of "mice," "goed" instead of "went," and apply regular past tense markers to irregular verbs. These errors are predictable, systematic, and reveal the child's internal rule system.

The neural network experiments replicated this pattern. Early generations of networks applied the same transformation rule uniformly, even when exceptions existed in the training data. Only later generations, after multiple transmission cycles and error corrections, developed the ability to handle exceptions while preserving general rules.

This mirrors historical linguistics: many irregular verbs in English are relics of older, regular patterns that have been replaced or modified over centuries. The irregular forms persist not because they are simpler but because they have survived transmission through a long chain of learners.

The Gradual Path to Complexity

Children do not acquire language in a single leap. They progress through predictable stages: babbling, one-word utterances, two-word combinations, and gradually more complex syntax. Each stage builds on the previous one, adding layers of complexity.

The neural network learning trajectory mirrored this progression. Early generations produced simple, flat structures—single-word responses or very short phrases. As the networks gained depth and experienced more transmission cycles, their outputs became hierarchical and recursive.

This staged development is crucial for learnability. If language required perfect mastery from the outset, it would be inaccessible to children with limited cognitive resources. Instead, language has evolved to be learnable in stages—each stage manageable on its own terms but collectively building toward full linguistic competence.

Error-Correcting Learning

The study demonstrated that successful learning requires error correction as a core mechanism. When the researchers trained networks on perfect data with no errors, the resulting representations were brittle and non-compositional. When errors were introduced during learning and correction was embedded in the transmission process, compositional structure emerged.

This finding validates theories of child language acquisition that emphasize the role of caregiver feedback. Children's errors are not simply corrected; they are reframed as opportunities for system refinement. The error itself becomes data for updating the internal model.

The research provides computational backing for this view: errors are not noise that needs filtering but signal that needs interpretation. The real magic of learning occurs at the boundary where expected patterns meet unexpected data—the exact moment when both children and neural networks face their greatest challenges and opportunity for growth.

Implications for Generative AI

For artificial intelligence researchers and developers, the implications of this work are both immediate and transformative. If language structure emerges through iterated learning in deep linear networks, then current approaches to training large language models may be missing fundamental opportunities—and potentially creating obstacles—to true linguistic understanding.

This includes limitations on attention and reasoning bottlenecks, such as those highlighted in our piece on The Attention Wall.

Deep Linear Insights for Deep Learning

The most direct implication is architectural: if depth enables compositional emergence, then current deep learning architectures may need reevaluation. Many modern LLMs are incredibly wide—thousands of parameters per layer—but their depth remains relatively modest by theoretical standards. The research suggests that adding more layers, even if linear, could produce dramatic gains in systematicity and compositionality.

Moreover, the study demonstrates that linearity itself is not a limitation but a feature. Linear networks are mathematically tractable, allowing precise analysis of how representational depth interacts with learning dynamics. This tractability enabled the researchers to isolate variables that would be obscured in nonlinear models.

For generative AI practitioners, this suggests a new design space: hybrid architectures that combine linear depth for structure discovery with strategic nonlinearity for expression. Rather than viewing depth and linearity as constraints, we might see them as foundational layers upon which linguistic structure can build.

The Learnability Imperative

Current LLM training focuses heavily on scale—more data, more parameters, longer contexts. The iterated learning framework suggests a different priority: learnability. Systems should be designed not just to handle more information but to organize it in ways that are inherently learnable by subsequent processing stages.

This has several concrete implications:

Modular Design: Breaking models into learnable modules that can be composed, similar to how human language combines words and phrases.
Structured Outputs: Constraining outputs to follow systematic patterns rather than free-form generation, making them more predictable and verifiable.
Error-Aware Training: Incorporating deliberate errors during training to force the model to develop robust internal representations.

Training Paradigm Shifts

Most current training procedures optimize for final output quality, treating intermediate representations as means to an end. The iterated learning framework inverts this priority: the representation itself is the target.

Instead of training a single model to maximize accuracy on a test set, we could train series of models where each generation learns from the previous one's outputs. This iterative refinement process mirrors how human language evolved and may produce more systematic, compositional representations.

Practical Applications

Several immediate applications emerge from this research:

Few-Shot Learning: Systems trained with iterated learning principles may generalize better from limited examples, as they learn underlying compositional rules rather than surface patterns.
Domain Transfer: A learnable language structure would transfer more effectively between domains, as the compositional rules remain consistent even when surface forms change.
Interpretability: Compositional systems are inherently more interpretable. If an AI's decisions follow systematic rules, we can trace its reasoning step by step.
Educational AI: Systems that understand how learnability works can adapt their outputs to match the learner's current representational capacity—teaching machines to teach other machines.

Beyond Language Models

The principles discovered apply beyond language. Any system requiring structured representation—computer vision, robotics control, scientific simulation—could benefit from iterated learning approaches. The core insight—that depth enables composition—suggests that architectural maturity may be more important than data volume in many domains.

The work of Dr. Jarvis and the Wits MIND Institute does not merely advance technical AI; it reorients our philosophical understanding of intelligence. Structure is not designed; it emerges. Learning is not performed; it evolves. Intelligence is not an endpoint but a process of continuous refinement through transmission and adaptation.

In this light, the most powerful AI systems may not be those built by human engineers alone but those that learn to build themselves through iterated transmission—a digital echo of the ancient, biological process that gave us language in the first place.

Conclusion: Bridging the Divide

The research conducted by Dr. Devon Jarvis and the Wits MIND Institute represents a rare convergence of disciplines—cognitive science, linguistics, and artificial intelligence—all pointing toward the same fundamental truth: language evolves for learnability.

This principle, simple in its statement yet profound in its implications, reshapes how we think about both human development and machine intelligence. When we understand that language is optimized not for adult speakers but for child learners, we see the patterns of linguistic structure as solutions to a specific problem: transmission across generations of imperfect learners.

The artificial experiments conducted by Jarvis's team provide computational proof that this optimization is not metaphorical but literal. By simulating iterated learning in deep linear networks, the researchers demonstrated that compositional structure emerges naturally when learners possess sufficient representational depth. The emergence is not preordained; it is contingent on architecture and process—on having a learning system capable of building hierarchical representations.

This finding has ripple effects across fields. For cognitive scientists, it provides a formal framework for understanding how children acquire language: not through memorization but through structured learning processes that reward systematicity and penalize arbitrariness. For linguists, it explains why human languages across the globe share certain structural regularities: they are not historical accidents but evolved solutions to the problem of learnability.

For AI researchers, the implications are equally transformative. If we want systems that generalize systematically rather than memorize pattern associations, we must design them with learnability in mind. This means prioritizing depth over width in some cases, embedding error correction into training pipelines, and viewing language structure as something that emerges rather than something that is designed.

The research also challenges assumptions about AI development. The deep linear network experiments show that sophisticated linguistic behavior can emerge from surprisingly simple architectures—provided they have the right structure and are trained through iterated learning. This suggests that we may be over-engineering our models, adding complexity where depth and proper training procedures would suffice.

Ultimately, the study offers a unified theory of structured learning. Whether biological or artificial, whether child or machine, the path to linguistic competence follows a similar trajectory: through iterative transmission, controlled error, and architectural maturity. The convergence across domains suggests that we have uncovered not just a computational curiosity but a fundamental principle of intelligent systems.

As we move forward, the work of Dr. Jarvis and his colleagues serves as both a compass and a challenge. It directs our attention not toward larger datasets or faster hardware but toward the fundamental mechanisms of learning itself. If we want AI that truly understands language, we must understand how humans learned to speak in the first place.

The iterated learning paradigm offers a path forward—one where structure emerges not from top-down design but from bottom-up transmission processes. It is a path that honors the complexity of language while respecting its learnability, bridging the divide between human cognition and artificial intelligence.

This is not merely an academic exercise. It is a roadmap for building systems that do not just mimic language but understand it—systems whose outputs are systematic, compositional, and learnable. In this light, the study represents not just a contribution to technical AI but a contribution to our understanding of intelligence itself.

The future of language technology may not lie in bigger models but in better learning processes—processes that mirror the biological evolution of language and the developmental trajectory of human children. By embracing iterated learning as a design principle, we may finally achieve the kind of systematic, compositional intelligence that allows machines to not just process language but participate in it meaningfully.

The convergence of deep learning and cognitive linguistics, as demonstrated by this research, suggests that we are entering a new era—one where AI development is informed not just by engineering but by science, and where the quest for artificial language understanding follows the same evolutionary principles that shaped human communication over millennia.

How Language Evolves for Learnability in AI and Humans: The Iterated Learning Revolution

Methodology: Modeling the Child Brain

The Iterated Learning Pipeline

Architectural Design: Deep Linear Networks

Findings: The Power of Structural Depth

The Depth Threshold

Compositionality and Systematicity

Errors as Features, Not Bugs

Quantitative Measures

Cognitive Linguistics and Developmental Mistakes

The Penguin Paradox and Non-Arbitrary Errors

Overgeneralization and Rule Formation

The Gradual Path to Complexity

Error-Correcting Learning

Implications for Generative AI

Deep Linear Insights for Deep Learning

The Learnability Imperative

Training Paradigm Shifts

Practical Applications

Beyond Language Models

Conclusion: Bridging the Divide

Related blogs

Convergent Predictive Processing in the Human Brain and AI

The Attention Wall: How a Classic Brain Test Exposes the Critical Bottleneck in LLM Reasoning

The Syntax of Surprise: How Confusing Code Triggers 'Linguistic Correction' Brain Waves