Summary: A new machine-learning study decodes behavioral strategies in worms, offering insights that may help explain how animals, including humans, make decisions.
Source: Kyoto University.
Researchers present a machine-learning method to infer animal preferences from behavior; the findings offer new perspectives on decision-making processes
Interpreting what animals want is a familiar challenge for pet owners and scientists alike. Classical experiments, such as Pavlov’s conditioning of dogs to link a bell with food, show that animals can learn associations between cues and rewards. Yet in natural settings, the nature of rewards and how they motivate behavior are often ambiguous. To study preference and decision-making in freely behaving animals, researchers need tools that can infer the value of potential rewards directly from observed behavior.
In a study published in PLOS Computational Biology, a team from Kyoto University’s Graduate School of Biostudies applied a machine-learning framework to investigate how the nematode Caenorhabditis elegans evaluates potential rewards by analyzing its movement patterns across temperature gradients.
Lead author Shoichiro Yamaguchi explains that conventional behavioral models assume known rewards and therefore fall short when animals move freely without explicit cues. “We approached the problem in reverse,” he says, “using the animal’s behavior to infer the values it assigns to different sensory inputs.”
The researchers tracked heat-sensitive worms that had been cultivated either with food or without food at a specific temperature, then observed how these worms navigated plates containing a range of surface temperatures. Automated tracking captured the animals’ positions and the temperatures they experienced over time, producing time-series data that could be analyzed with inverse reinforcement learning.
Worms that had been fed at a given cultivation temperature tended to move toward that temperature zone after being placed on a thermal gradient. Starved worms, in contrast, avoided the cultivation temperature and moved away from it. Applying their inverse reinforcement learning (IRL) model, the team inferred the underlying behavioral strategies guiding these movements.
The model indicated that fed worms used two complementary pieces of information: the absolute temperature they experienced and the rate of temperature change as they moved. These signals were combined into a strategy that balanced directed migration—moving efficiently to a specific temperature—and isothermal migration—tracking along a constant temperature. This combination allowed fed worms to reach preferred temperatures using minimal effort, a pattern resembling efficient, goal-directed decision-making.

Starved worms showed a different strategy: they relied primarily on absolute temperature and not on temporal changes in temperature, using this information to escape zones they judged unlikely to contain food. The contrast between fed and starved animals highlights how internal states—such as hunger—alter the sensory cues and strategies that animals use to navigate their environment.
Senior scientist Honda Naoki emphasizes that the IRL-based approach reproduces simple worm behaviors while revealing the computational rules behind them. “By combining behavioral analysis with neural measurements in freely moving animals, this method can deepen our understanding of decision-making mechanisms in more complex brains and inform developments in artificial intelligence,” he says.
Funding: This research was supported by the Japan Society for the Promotion of Science, Kyoto University, and the Japan Agency for Medical Research and Development.
Source and Authors: Raymond Kunikane Terhune; original research by Shoichiro Yamaguchi, Honda Naoki, Muneki Ikeda, Yuki Tsukada, Shunji Nakano, Ikue Mori, and Shin Ishii. The work appears in PLOS Computational Biology under the title “Identification of animal behavioral strategies by inverse reinforcement learning.” Published May 2, 2018.
Identification of animal behavioral strategies by inverse reinforcement learning
Animals control diverse behaviors to reach desired states in their environments. Identifying the strategies that govern these behaviors is central to understanding decision-making and the information processing performed by nervous systems, but tools to quantify such strategies from observed behavior are limited. Here, the authors develop an inverse reinforcement-learning framework to extract an animal’s behavioral strategy from time-series movement data and apply it to thermotactic behavior in C. elegans. After cultivation at a constant temperature with or without food, fed worms preferentially move toward the cultivation temperature on a thermal gradient, whereas starved worms avoid it. The IRL analysis reveals that fed worms integrate both absolute temperature and its temporal derivative, combining two strategies—directed migration and isothermal migration—to reach preferred temperatures efficiently. Starved worms, by contrast, rely mainly on absolute temperature to escape the cultivation zone. Applying the method to animals lacking specific thermosensory neurons further links these behavioral strategies to neural substrates. The IRL-based approach provides a general tool for inferring behavioral strategies from movement data and can be applied broadly to studies of decision-making across species.
Feel free to share this summary of the research. The findings advance methods for decoding animal preferences and offer a framework that can bridge behavioral analysis, neural measurement, and artificial intelligence research.