How Two Brain Learning Systems Drive Habit Formation

Summary: The brain relies on two distinct dopamine-based learning systems: one that evaluates outcomes and another that reinforces repeated actions. Known as reward prediction error (RPE) and action prediction error (APE), these parallel systems explain how habits form and why they become difficult to change.

RPE enables learning from outcomes, signaling whether an event is better or worse than expected. APE, identified in this study, strengthens behaviors that are repeated frequently, allowing the brain to store a default action policy. This frees cognitive resources for other decisions and supports multitasking. Experiments show that damaging the tail of the striatum—where APE is encoded—prevents mice from forming habits, highlighting this region’s essential role in habitual learning.

Key Facts:

Second learning system discovered: An action prediction error (APE) signal reinforces frequent actions and supports habit formation.
Distinct brain region: The tail of the striatum appears dedicated to movement-related learning and is separate from regions that encode outcome value.
Clinical relevance: Understanding APE may inform new treatments for addiction, compulsive behaviours and aspects of Parkinson’s disease.

Source: Sainsbury Wellcome Center

Neuroscientists at the Sainsbury Wellcome Centre (SWC) at UCL have demonstrated that learning by trial and error relies on two complementary dopaminergic systems.

This discovery—the identification of a second, movement-related teaching signal—offers a clearer explanation for how stable habits form and suggests new avenues for treating disorders linked to habitual behaviour, including some forms of addiction and compulsions. Published in Nature, the mouse study may also have implications for Parkinson’s research.

This shows a brain and a person. — This research also has potential implications for Parkinson’s, which is known to be caused by the death of midbrain dopamine neurons, specifically in substantia nigra pars compacta. Credit: Neuroscience News

“We’ve identified a mechanism that appears to underlie habitual behaviour,” said Dr Marcus Stephenson-Jones, Group Leader at SWC and lead author of the study. “Once a preference for a particular action develops, the brain can bypass value-based decision making and rely on a default policy shaped by prior actions. That lets you free up cognitive resources to evaluate other choices.”

Previously, dopamine was primarily associated with reward prediction errors (RPE) that signal whether outcomes are better or worse than expected. The new experiments reveal a second dopaminergic signal—action prediction error (APE)—that encodes how often an action is performed and reinforces repeated associations regardless of their immediate value.

Together, RPE and APE allow animals to learn in two complementary ways: selecting the most valuable option or defaulting to the most frequently performed action. As an everyday example, first-time choices at a sandwich shop involve deliberation and value-based evaluation; after many visits, selecting the same sandwich becomes automatic because a default action has been reinforced.

This value-free storage of repetitive behaviour is computationally simpler than constantly comparing option values, which likely makes multitasking more efficient. For instance, experienced drivers can hold conversations while driving because habitual motor control is handled by an automatic system, freeing the value-based system to focus on other decisions.

Earlier work had located learning-related dopamine neurons across three midbrain regions: the ventral tegmental area and two parts of the substantia nigra (pars compacta and pars lateralis). Some of these neurons code for reward, while others are movement-related. The new study maps this functional distinction onto striatal targets: RPE neurons project broadly across the striatum except the tail, while movement-specific neurons project everywhere except the nucleus accumbens. This suggests the nucleus accumbens is dedicated to reward signalling, and the tail of the striatum to movement-related signals.

Focusing on the tail of the striatum, researchers used an auditory discrimination task in mice and a genetically encoded dopamine sensor to measure dopamine release. The sensor showed that dopamine activity in the tail correlated with movement rather than reward. Lesion experiments revealed that mice lacking the tail of the striatum learned initially like controls but failed to transition to expert, habitual performance once a preference emerged. Lesioned mice continued to improve only gradually, relying solely on RPE, whereas intact mice used both RPE and APE to consolidate fast, stable performance.

Silencing the tail of the striatum in expert mice produced a dramatic loss of task performance, demonstrating that late-stage, habitual behaviour depends on this movement-related dopaminergic signal. Computational modelling, led by Dr Claudia Clopath, showed that APE alone cannot support reward-guided learning but, when combined with RPE circuitry, consolidates stable sound–action associations in a value-free way.

These results help explain why bad habits are hard to break: a consistent replacement action—such as using nicotine gum instead of smoking—can allow the APE system to form a new habitual association over time. The findings also shift attention beyond the nucleus accumbens to the tail of the striatum as a potential therapeutic target for disorders involving habit and compulsion.

The study may also shed light on Parkinson’s disease. Parkinson’s involves loss of midbrain dopamine neurons, particularly in substantia nigra pars compacta, many of which are movement-related and may encode APE. This could account for why habitual motor behaviours such as walking are disrupted in Parkinson’s, while novel or flexible movements can remain relatively preserved.

“We now have a theory for paradoxical movement in Parkinson’s,” Dr Stephenson-Jones said. “If the movement-related neurons that support habitual behaviour are lost, habitual movements become compromised while value-based, flexible movements can remain intact. This opens new directions for research and potential interventions.”

The team is continuing to test whether APE is strictly necessary for habit formation and to dissect precisely what each system learns and how they interact over time.

Funding: This research was supported by an EMBO Long-Term Fellowship (ALTF 827-2018), a Swedish Research Council International Postdoc Grant (2020-06365), the Sainsbury Wellcome Centre Core Grant from the Gatsby Charitable Foundation and Wellcome (219627/Z/19/Z), the SWC PhD Programme, and a European Research Council Starting Grant (#557533).

About this habit formation and neuroscience research news

Author: April Cashin-Garbutt
Source: Sainsbury Wellcome Center
Contact: April Cashin-Garbutt – Sainsbury Wellcome Center
Image: The image is credited to Neuroscience News

Original Research: Open access.
“Dopaminergic action prediction errors serve as a value-free teaching signal” by Marcus Stephenson-Jones et al., Nature.

Abstract

Dopaminergic action prediction errors serve as a value-free teaching signal

Animal choice behaviour reflects two main tendencies: selecting actions that previously led to rewards and repeating past actions. Theoretical work proposes these strategies are reinforced by distinct dopaminergic teaching signals: reward prediction error (RPE) to reinforce value-based associations and movement-based action prediction error (APE) to support value-free repetition.

Using an auditory discrimination task in mice, the authors show that movement-related dopamine activity in the tail of the striatum encodes an APE signal. Causal manipulation demonstrates this signal functions as a value-free teaching signal that reinforces repeated associations and consolidates stable sound–action mappings. Computational models and experiments indicate APE cannot support reward-guided learning alone but, when paired with RPE circuitry, stabilizes habitual associations. The study concludes that two types of dopaminergic prediction errors operate together to support complementary forms of learning, each reinforcing distinct associations in different striatal regions.