How AI Uses Language to Master Human Tasks

Summary: Researchers at the University of Geneva have developed an artificial intelligence system that can learn new tasks from verbal or written instructions and then describe those tasks linguistically to another AI so it can perform them. This advance demonstrates, for the first time in a neural model, the ability to transform language into sensorimotor actions and to communicate those actions to a peer—an important step for AI, natural language processing, and robotics.

The team combined a large pre-trained language model with a smaller, trainable neural network to simulate brain regions involved in language perception and production. The resulting system links linguistic representations to sensorimotor behavior, enabling zero-shot generalization—performing tasks it had never practiced—based solely on natural language instructions. The findings, published in Nature Neuroscience, point toward new ways machines could learn from and teach each other in human-like ways, with clear implications for humanoid robots and collaborative AI agents.

Key Facts:

Human-like instruction following: The model performs novel tasks using only verbal or written instructions and can describe those tasks to a second AI that reproduces them.
Neural model integration: Researchers connected a large pre-trained language model (S-Bert) to a smaller sensorimotor network to emulate language perception (Wernicke-like) and production (Broca-like) functions.
Robotics and collaboration: The approach suggests a route to robots and agents that understand instructions, translate them into actions, and communicate procedures to peers.

Source: University of Geneva

Interpreting linguistic instructions to perform new tasks and then explaining those tasks to others is a hallmark of human cognition. Unlike most animals, humans can teach one another using language rather than relying solely on repeated training with feedback. Recreating that faculty in machines is a core goal of natural language processing and sensorimotor AI.

The research team at UNIGE constructed a neural model that brings language and action into the same functional framework. They started with a pre-trained language model, S-Bert, which contains roughly 300 million parameters and is trained to represent linguistic meaning. This language component was connected to a much smaller network of a few thousand units that was trained to generate sensorimotor responses. In staged training, the combined system learned to emulate Wernicke’s area—interpreting instructions—and Broca’s area—producing verbal output—so the network could both understand instructions and generate task descriptions.

This shows two robots. — In the first stage of the experiment, the neuroscientists trained this network to simulate Wernicke’s area, the part of our brain that enables us to perceive and interpret language. Credit: Neuroscience News

The experimental tasks were drawn from common psychophysical paradigms. Examples include: pointing to the left or right location where a stimulus appears; responding in the opposite direction of a cue; or choosing the brighter of two visual stimuli with slightly different contrast levels. After training on a subset of tasks, the model received written English instructions describing novel tasks it had not practiced. In many cases the network achieved strong zero-shot performance, correctly executing unseen tasks based only on language input.

Crucially, once the first network learned a task, it could generate a linguistic description of that task. A second network—a copy of the first—then received that description and was able to perform the task as well. According to the authors, this is the first demonstration of two artificial networks communicating purely via natural language so that one guides the other’s sensorimotor behavior.

Implications for robotics and AI collaboration

By aligning the geometry of sensorimotor representations with semantic representations from language, the researchers show how linguistic instructions can scaffold flexible behavior and compositional generalization. This architecture offers experimentally testable predictions about how language must be represented to support flexible cognition in the human brain, and at the same time provides an engineering pathway for robots that can understand instructions, perform tasks without trial-and-error training, and teach other machines.

The network used in the study is intentionally compact, demonstrating that the mechanism for linking language and action does not require enormous computational scale. This suggests that larger, more sophisticated implementations could be incorporated into humanoid robots and multi-agent systems to enable natural, linguistic coordination among machines and between machines and humans.

About this AI research news

Author: Antoine Guenot
Source: University of Geneva
Contact: Antoine Guenot – University of Geneva
Image: Image credit: Neuroscience News

Original Research: Open access. “Natural Language Instructions Induce Compositional Generalization in Networks of Neurons” by Alexandre Pouget et al., published in Nature Neuroscience.

Abstract

Natural Language Instructions Induce Compositional Generalization in Networks of Neurons

Humans can interpret linguistic instructions to perform novel tasks without direct experience. The neural computations that support this capacity are not fully understood. Leveraging advances in natural language processing, the authors build neural models that receive instructions embedded by a pretrained language encoder and are trained on a set of psychophysical tasks. The best models in this study achieved an average of 83% correct on previously unseen tasks using only linguistic instructions (zero-shot learning).

The results indicate that language structures sensorimotor representations so that activity for related tasks shares geometric relationships with the semantic content of instructions. This alignment allows language to cue the correct composition of learned skills in new contexts. The model can also generate a linguistic description of a novel task derived solely from motor feedback, which can then guide a partner model to perform the task. The authors propose experimental predictions about how linguistic information must be represented to enable flexible, general cognition in the human brain and offer a roadmap for building collaborative, language-capable AI systems.