Summary: Researchers have discovered two distinct groups of neurons in the ventral striatum that separately encode outcomes that are better than expected and worse than expected. These neural populations together represent the full range of possible rewards a decision can yield, allowing the brain to evaluate risk and reward more precisely. In mice, inhibiting either population changed how the animals anticipated rewards and altered their choice-related behavior.
The work supports a view of decision-making that mirrors developments in machine-learning—namely, that the brain may track a distribution of potential outcomes rather than a single average value. If similar circuitry operates in humans, these findings could help explain impaired risk evaluation observed in disorders such as depression and addiction. Future work will investigate how uncertainty and variability influence this opposing neural circuitry.
Key Facts:
- Two neural populations: One ensemble preferentially signals outcomes better than expected, while the other preferentially signals worse-than-expected outcomes.
- Decision-making model: Together these groups provide a representation of the full distribution of possible rewards, not only the mean.
- Clinical relevance: The findings suggest a neural basis for altered risk assessment in neuropsychiatric conditions and may guide future therapeutic research.
Source: Harvard
Every day, our brains weigh thousands of choices, from small preferences such as where to eat to major life decisions like changing careers or relocating. Each choice carries the potential for outcomes that are better or worse than anticipated.
How the brain gauges risk versus reward in these moment-to-moment decisions remains a core question in neuroscience. A new study by researchers at Harvard Medical School and Harvard University sheds light on this process by combining ideas from modern machine-learning with experiments in mice.

Published Feb. 19 in Nature and supported in part by federal funding, the study applied distributional concepts from machine learning to probe how reward-based decisions are encoded in the brain. The team looked specifically at the ventral striatum, a central hub for reward representation and decision-related learning.
The experiments revealed two functionally distinct groups of neurons. One population preferentially represents outcomes that exceed expectations, while the other emphasizes outcomes that fall short. When combined, the two populations form a richer internal estimate of possible outcomes than the single-value estimates assumed by many traditional reinforcement-learning models.
“Our results suggest that mice—and likely other mammals—retain more detailed information about the variability of rewards than previously appreciated,” said Jan Drugowitsch, co-senior author and associate professor of neurobiology at Harvard Medical School’s Blavatnik Institute.
If this pattern holds in humans, the findings offer a concrete neural mechanism for how people evaluate risk and reward and may help explain why disruptions to reward circuitry are linked to poor judgment in conditions such as depression and addiction.
Machine learning informs neuroscience
Traditional models of decision-making often compress past outcomes into an average expected value. But real-world choices frequently involve different degrees of variability: two options can share the same mean reward while offering very different chances of excellent or poor outcomes. People and animals routinely prefer or avoid such variability, indicating they encode more than a single mean value.
Recent advances in machine learning introduced distributional reinforcement learning, where agents learn the entire probability distribution of rewards for each action, not just the average. Algorithms using this distributional approach have shown superior performance across games and tasks where outcomes vary widely.
Earlier work reanalyzing neural data suggested dopamine signals align better with distributional predictions than with mean-based models. Building on that insight, the current study directly searched for neural representations of reward distributions in the striatum.
Probing the ventral striatum in mice
To test how distributional information is stored, the researchers trained mice to associate specific odors with varying magnitudes of reward, thereby teaching the animals the range of outcomes tied to each cue. While monitoring licking behavior (a proxy for anticipated reward) they recorded neural activity in the ventral striatum using high-density probes and calcium imaging.
Two distinct neuronal groups emerged. One population—conceptually an “optimist”—encoded signals consistent with better-than-expected rewards; the other—an apparent “pessimist”—encoded worse-than-expected outcomes. Silencing the optimistic group made mice behave as if they expected lower-quality rewards; silencing the pessimistic group had the opposite effect, increasing anticipated value.
“You can think of the system as having internal advisors with opposing outlooks, which together form a detailed map of potential outcomes,” Drugowitsch said. These opposing signals allow the brain to represent both tails of the reward distribution and thus make more nuanced choices between safe and risky options.
The authors plan to extend their work to understand how uncertainty and variability across more complex options are encoded, and how these mechanisms contribute to general reasoning and decision-making.
While more research is needed to confirm these mechanisms in humans and to account for the full complexity of human choices, the parallels between rodent and human reward systems suggest this work may help explain altered risk evaluation in psychiatric conditions.
Authorship, funding, disclosures
Additional authors include Adam Lowet, Qiao Zheng, Melissa Meng, and Sara Matias.
Funding: The study received support from the National Institutes of Health (R01NS116753; F31NS124095), the Human Frontier Science Program (LT000801/2018), the Harvard Brain Science Initiative, and the Brain & Behavior Research Foundation.
About this decision-making and neuroscience research news
Author: Dennis Nealon
Source: Harvard
Contact: Dennis Nealon – Harvard
Image: The image is credited to Neuroscience News
Original Research: Closed access.
“An opponent striatal circuit for distributional reinforcement learning” by Jan Drugowitsch et al., published in Nature.
Abstract
An opponent striatal circuit for distributional reinforcement learning
Machine learning has shown large gains by expanding learning targets from mean rewards to full reward probability distributions—a strategy called distributional reinforcement learning. The mesolimbic dopamine system has been linked to updating mean-value representations in the striatum, but whether and how striatal neurons encode higher-order features of reward distributions has been unclear.
Using high-density Neuropixels recordings in mice performing a conditioning task that independently manipulated reward mean, variance, and stimulus identity, the authors found clear evidence that striatal activity encodes variance alongside mean value. Chronic removal of dopamine inputs disrupted these distributional representations without eliminating mean-value coding.
Two-photon calcium imaging and optogenetics showed that the two major classes of striatal medium spiny neurons—D1 and D2—preferentially encode opposite tails of the reward distribution. The findings are integrated into a model in which opponency between D1 and D2 neurons enables the striatum and mesolimbic dopamine system to implement the computational benefits of distributional reinforcement learning.