Summary: Using Dr. Seuss’s The Lorax as a naturalistic stimulus, researchers at the University of Rochester used fMRI to reveal how the brain engages a broad network of regions when people listen to and watch narrative speech. The results show that audiovisual speech perception recruits not only classic multisensory integration areas but also wider semantic, sensory, and affective systems, offering new directions for studying neurodevelopmental conditions.
Source: University of Rochester
Researchers from the Del Monte Institute for Neuroscience at the University of Rochester report new insights into how the brain processes complex audiovisual speech.
Published in NeuroImage, this study examines how watching and listening to a narrator telling a story engages an extensive collection of brain regions associated with sensory processing, multisensory integration, language, and cognitive functions supporting comprehension of narrative content.
Understanding this larger network helps researchers develop more targeted approaches for studying how sensory integration develops and how it may go awry in neurodevelopmental disorders.
“Multisensory integration is an important function of our nervous system as it can substantially enhance our ability to detect and identify objects in our environment,” said Lars Ross, Ph.D., research assistant professor of Imaging Sciences and Neuroscience and first author of the study. He added that a failure of this function can create an overwhelming sensory environment and may underlie difficulties in adapting to surroundings, a problem associated with some neurodevelopmental conditions such as autism.
Using blood-oxygen-level-dependent functional MRI (BOLD fMRI), the team recorded brain activity from 53 adult participants while they viewed a recorded reading of The Lorax. Presentation of the story was varied across four conditions presented in random order: audio-only, visual-only, synchronous audiovisual (matched audio and video), and asynchronous audiovisual (mismatched timing). The researchers also tracked participants’ eye movements to verify visual attention during the task.

Consistent with expectations, the investigators observed multisensory enhancement in classic integration sites, including regions of the posterior superior temporal gyrus. Importantly, they also found that viewing the speaker’s facial movements increased activity across a broader semantic network and in extralinguistic regions not typically highlighted in multisensory studies—most notably the amygdala and primary visual cortex.
The analyses additionally revealed engagement of thalamic nuclei along both visual and auditory pathways. These thalamic regions are usually associated with early sensory processing and are likely to be important early convergence points where information from the eyes and ears interact.
“This suggests many regions beyond traditional multisensory sites play a role in how the brain processes complex, naturalistic multisensory speech,” Ross said, noting that the network includes areas tied to sensory perception, emotional evaluation, and higher-level cognitive processing.
The experiment was deliberately designed with pediatric applications in mind. The team has already begun parallel work with children and with adults on the autism spectrum to chart how audiovisual speech processing develops and differs across populations.
“Our lab is profoundly interested in this network because it goes awry in a number of neurodevelopmental disorders,” said John Foxe, Ph.D., lead author of the study. With a detailed map of multisensory speech-related circuitry, researchers can pose more specific questions about which circuits may be altered in conditions such as autism and dyslexia.
Additional co-authors on the paper include Sophie Molholm, Ph.D., and Victor Bene of Albert Einstein College of Medicine, and John Butler, Ph.D., of Technological University Dublin. The study represents a collaboration between two Intellectual and Developmental Disability Research Centers (IDDRCs) supported by the National Institute of Child Health and Human Development (NICHD). In 2020, the University of Rochester was designated an IDDRC by NICHD in recognition of the Medical Center’s leadership in research on conditions such as autism, Batten disease, and Rett syndrome. Sophie Molholm is co-director of the Rose F. Kennedy IDDRC at Einstein.
About this speech processing research news
Author: Kelsie Smith Hayduk
Source: University of Rochester
Contact: Kelsie Smith Hayduk – University of Rochester
Image: The image is in the public domain
Original Research: Open access. “Neural correlates of multisensory enhancement in audiovisual narrative speech perception: A fMRI investigation” by Lars Ross et al., NeuroImage. DOI: 10.1016/j.neuroimage.2022.119598
Abstract
Neural correlates of multisensory enhancement in audiovisual narrative speech perception: A fMRI investigation
This functional MRI study examined how seeing a speaker’s articulatory movements while listening to a continuous, naturalistic narrative affects brain activity. The investigators aimed to identify regions within the language network that show enhanced responses during synchronous audiovisual speech compared with unimodal or asynchronous presentations.
The authors hypothesized that enhancement would appear not only in established audiovisual integration sites—such as posterior superior temporal regions—but also in parts of the broader semantic system. To test this, 53 participants heard and/or watched a continuous spoken story under four conditions: auditory only, visual only, synchronous audiovisual, and asynchronous audiovisual, while BOLD fMRI recorded brain activity.
Results revealed multisensory enhancement across an extensive network including classical multisensory integration areas and elements of the semantic network, as well as extralinguistic regions not typically associated with multisensory integration, specifically primary visual cortex and bilateral amygdala. The analyses also indicated involvement of thalamic nuclei along visual and auditory pathways, structures more commonly linked to early sensory processing.
The authors conclude that under natural listening conditions, multisensory enhancement is not confined to canonical integration sites but extends across a wide semantic network and into regions implicated in sensory, perceptual, and cognitive processes that lie outside traditional language areas. This expanded perspective on audiovisual speech processing provides a richer framework for investigating how these networks develop and malfunction in neurodevelopmental disorders.