How Vision Aids Voice Recognition in the Brain

Summary: New evidence suggests voice and face recognition share closer neural ties than previously believed.

Source: University of Pittsburgh

Researchers reporting in the Journal of Neurophysiology found that the brain region best known for processing faces also responds when people hear familiar voices, suggesting visual and auditory person recognition converge in overlapping neural circuits.

A fresh study from the University of Pittsburgh, published recently in the Journal of Neurophysiology, provides direct neural evidence that voice recognition and face recognition are more intimately linked than has been appreciated. The findings point to a shared processing pathway that enhances our ability to identify people by combining auditory and visual information.

Behavioral research has long shown that people recognize familiar voices faster and more accurately when they can associate them with a corresponding face, but the neural basis for this advantage has been unclear. According to senior author Taylor Abel, M.D., associate professor of neurological surgery at the University of Pittsburgh School of Medicine, the new recordings reveal that areas of the visual cortex specialized for faces become active in response to heard voices, highlighting a close interaction between sensory systems.

Traditionally, neuroscientists treated auditory and visual processing systems as largely separate, each occupying distinct brain territories. Only a handful of studies have recorded directly from the cortical regions responsible for face perception to test whether those regions also respond to familiar voices. This study advances that approach by using high-resolution electrocorticography (ECoG) recordings from clinical patients.

The research team had a unique opportunity to record neural activity from patients undergoing evaluation for epilepsy surgery, where temporary electrode implants are placed to localize seizure origins. Five adult participants consented to take part in a person-identification task while cortical activity was recorded from the fusiform gyri (FG), the bilateral temporal lobe regions strongly associated with face processing.

This shows a head with a mouth and one with a giant eye ball — Although auditory and visual processing have often been studied separately, this research shows they can interact within the fusiform gyri. Image is in the public domain

During the task, participants viewed photographs of three well-known U.S. presidents — Bill Clinton, George W. Bush, and Barack Obama — or listened to short clips of their voices, identifying each speaker or portrait. Electrocorticography recordings from face-responsive sites in and near the fusiform gyrus showed clear visual responses when participants viewed the portraits and, importantly, measurable responses when they heard familiar voices.

Voice-evoked responses in the FG were consistently lower in amplitude and emerged later in time than the visual responses, typically appearing in the 300–600 millisecond range after voice onset. These temporal dynamics support the idea that voice-related activity in FG may reflect top-down feedback from higher-order person-recognition networks rather than purely bottom-up auditory input.

“The results demonstrate that auditory and visual systems interact very early during person identification and do not operate in isolation,” Dr. Abel said. He added that understanding these interactions helps explain clinical disorders where face or voice recognition is impaired, such as certain dementias and related conditions, and may inform diagnostic and therapeutic approaches for such deficits.

Co-first authors on the study are Ariane Rhone, Ph.D., of the University of Iowa, and Kyle Rupp, Ph.D., of the University of Pittsburgh. Other contributors include Dan Tranel, Ph.D., and Matthew Howard, III, Ph.D., both of the University of Iowa; and Jasmine Hect, Ph.D., and Emily Harford, Ph.D., both of the University of Pittsburgh. The research was supported by National Institutes of Health grants R01 DC004290 and R21 DC019217.

About this auditory and visual neuroscience research news

Author: Anastasia Gorelova
Source: University of Pittsburgh
Contact: Anastasia Gorelova – University of Pittsburgh
Image: The image is in the public domain

Original Research: Closed access. “Electrocorticography reveals the dynamics of famous voice responses in human fusiform gyrus” by Ariane Rhone et al., Journal of Neurophysiology

Abstract

Electrocorticography reveals the dynamics of famous voice responses in human fusiform gyrus

Voice and face recognition depend on convergent neural mechanisms that together support efficient speaker identification. Neuroimaging has hinted that processing familiar voices engages early visual cortex regions, including the bilateral fusiform gyri on the ventral temporal lobe. Yet it remained unclear whether those visual areas actively contribute to voice recognition and whether their voice-evoked activity is driven bottom-up or via top-down feedback.

This study directly examined neural responses to familiar voices and faces in human fusiform gyrus using electrocorticography in epilepsy surgery patients. Recordings from five adult patients performing a person-identification task with visual and auditory stimuli from famous speakers (U.S. presidents Barack Obama, George W. Bush, and Bill Clinton) showed that a subset of face-responsive cortical sites also responded to familiar voices. Voice responses were lower in magnitude and delayed relative to visual responses, consistent with a top-down feedback-mediated contribution from FG that may facilitate speaker identification.