Teach Recurrent Neural Networks Basic Skills to Improve Learning

Summary: Researchers at New York University demonstrate that artificial intelligence systems learn more effectively when they first master simple tasks. By training recurrent neural networks (RNNs) with a staged, “kindergarten curriculum” that begins with basic skills and then composes them into more complex behaviors, the team achieved faster, more robust performance on challenging decision-making tasks.

Drawing inspiration from animal learning experiments, the researchers showed that RNNs benefit from the same stepwise developmental approach humans use—learn simple building blocks first, then combine them. Their combined laboratory and computational work indicates this compositional pretraining can give AI systems a practical, biologically informed advantage.

Key Facts:

Curriculum Boost: RNNs pretrained on simple component tasks reach solutions to complex tasks more quickly than networks trained conventionally.
Animal-Inspired Design: Laboratory rats learned to combine basic sensory cues and actions to retrieve water, providing a model for how complex behavior builds from simple skills.
Human-Like Learning: A developmentally inspired training strategy—here called kindergarten curriculum learning—encourages RNNs to learn modular computations that generalize to harder problems.

Source: NYU

We learn letters before we learn to read, and numbers before arithmetic.

A team of scientists at New York University tested the same principle for AI. In a study published in Nature Machine Intelligence, they found that recurrent neural networks trained on a sequence of simpler tasks before tackling complex goals show improved learning efficiency and behavioral similarity to animals solving related problems.

The authors coined the training protocol “kindergarten curriculum learning” because it explicitly builds foundational skills first and then composes those skills to solve harder problems. This mirrors how humans and animals develop layered competencies over time.

“From an early age, we acquire basic skills—like balancing or manipulating an object—that later combine to support sophisticated actions,” says Cristina Savin, Associate Professor in NYU’s Center for Neural Science and Center for Data Science. “Applying this progression to RNNs lets them store elementary computations and later reuse those primitives to perform complex tasks.”

RNNs are designed to process sequential information by maintaining internal states, which makes them useful for applications such as speech recognition and language translation. However, conventional training approaches can struggle when networks must discover and combine multiple cognitive operations required by complex behaviors.

To ground their computational approach, the team first ran behavioral experiments with laboratory rats. The animals were trained to collect water from one of several ports inside a testing box. Success required the rats to recognize auditory and visual cues that signaled upcoming water delivery, to wait for a delay after those cues, and then to act at the correct port.

This task structure forced the rats to learn multiple simple associations—sound precedes water, light marks the correct port, wait before sampling—and then to compose these subskills to achieve the overall goal of water retrieval. The observed animal behavior revealed how compositional learning supports complex decision-making in biological systems.

Inspired by these findings, the researchers designed an analogous training scheme for RNNs. Instead of water retrieval, the networks were trained on a temporal wagering task that required combining basic decision-making computations to accumulate reward over time. The investigators created a pretraining curriculum of simpler cognitive subtasks that reflect the task’s relevant subcomputations.

When compared to conventional training methods, RNNs that received compositional pretraining learned more quickly and adopted strategies resembling those observed in rats. Specifically, kindergarten curriculum learning promoted the emergence of slow-timescale dynamics in the networks, enabling long-timescale inference of latent states and value-based decision-making—features that standard pretraining failed to capture.

“AI agents need a kind of kindergarten to build the inductive biases required for complex behavior,” notes Savin. “Our results point toward training paradigms that reflect developmental processes and better leverage prior experience to accelerate learning of new skills.”

Funding: This research was supported by grants from the National Institute of Mental Health (1R01MH125571-01, 1K01MH132043-01A1) and used research computing resources provided by the Empire AI consortium, with additional support from the State of New York, the Simons Foundation, and the Secunda Family Foundation.

About this AI and learning research news

Author: James Devitt
Source: NYU
Contact: James Devitt – NYU
Image: The image is credited to Neuroscience News

Original Research: Closed access.
Title: “Compositional pretraining improves computational efficiency and matches animal behaviour on complex tasks” by Cristina Savin et al., published in Nature Machine Intelligence.

Abstract

Compositional pretraining improves computational efficiency and matches animal behaviour on complex tasks

Recurrent neural networks (RNNs) are commonly used in neuroscience to model neural dynamics and behavior. Yet, standard training approaches often fall short when networks must solve complex cognitive tasks that combine multiple computations. Here, the authors introduce a principled method for identifying and incorporating compositional subtasks into RNN training, a strategy they call kindergarten curriculum learning.

Using a temporal wagering task previously studied in rats as the target, the team developed a pretraining curriculum composed of simpler cognitive tasks that represent relevant subcomputations. They demonstrate that this compositional pretraining substantially improves learning efficiency and is essential for RNNs to adopt strategies similar to those of rats—most notably, long-timescale inference of latent states, which conventional pretraining fails to reproduce.

Mechanistically, the pretraining encourages the development of slow dynamical features in the networks that support both inference and value-based decision-making. Overall, this approach imbues RNNs with useful inductive biases that are important for modeling complex behaviors relying on multiple cognitive functions.