Summary: A new study shows that current machine learning algorithms may not reliably identify the actual brain regions used to process specific syllables. While machine learning can decode mental states from neuroimaging data, it does not necessarily reveal the neural mechanisms the brain uses to perform specific tasks.
Source: University of Geneva.
For roughly the past decade, researchers have applied artificial intelligence methods—especially machine learning—to decode human brain activity. These techniques, when applied to neuroimaging data, can reconstruct aspects of what people see, hear, or think. They can reveal, for instance, that words with related meanings activate neighboring or overlapping zones in the brain. However, researchers from the University of Geneva (UNIGE) and the École normale supérieure (ENS) in Paris report that the brain does not always use the regions identified by machine-learning models to perform a task. By recording brain activity as people judged whether they heard the syllable “BA” or “DA,” the team shows that machine-decoded maps often reflect associative or downstream signals rather than the neural areas that directly carry out the perceptual decision. In short, decoding algorithms are powerful for reading out mental states but can mislead when used to infer the specific information-processing roles of brain regions. The study appears in PNAS.
Recent neuroimaging methods have highlighted how the brain spatially represents speech sounds, producing detailed maps of where different word sounds are encoded. UNIGE scientists asked a straightforward question: when the brain performs a specific task, do those spatial maps indicate which regions are actually used by the brain to make the decision? “We applied multiple human neuroimaging techniques to address this question,” says Anne-Lise Giraud, professor in the Department of Basic Neurosciences at the UNIGE Faculty of Medicine.
A focal region for selecting information
The researchers tested about fifty participants who listened to a continuum of syllables between BA and DA. The central phonemes in the continuum were acoustically ambiguous, making the decision difficult in many trials. The team recorded brain activity with both functional MRI (fMRI) and magnetoencephalography (MEG) to compare responses when the acoustic cue was clear versus when the stimulus forced an active, perceptual interpretation.
The findings were striking: regardless of whether the stimulus was easy or ambiguous, the behavioral decision—identifying BA versus DA—consistently engaged a small, focused area of the posterior superior temporal lobe. “The decision consistently recruits a compact region of the posterior superior temporal gyrus,” notes Anne-Lise Giraud.
The researchers further confirmed this focal role by studying a patient with damage to that specific posterior superior temporal region. Although the patient showed no obvious symptoms in daily life, they could no longer reliably distinguish BA from DA. “This clinical observation supports the conclusion that this localized area is critical for processing these phonemic distinctions,” adds Sophie Bouton, a member of Giraud’s team.
The “false positives” of machine learning decoding
To examine whether syllable identity is encoded only in that focal region or distributed more widely—as some machine-learning maps suggest—the team recorded from patients who had intracranial electrodes implanted for clinical reasons. These direct neural recordings provide very precise, contact-by-contact measurements. A traditional univariate analysis of those recordings showed that only electrode contacts in the posterior superior temporal lobe responded during the syllable identification task, corroborating the fMRI and MEG results.
However, when the researchers applied multivariate machine-learning decoders to the same data, classifiers produced significant decoding results across much of the temporal lobe and even beyond it. “Learning algorithms are highly sensitive and will exploit any informative patterns in the signals, but they do not distinguish whether the information they use was actually required to make the decision or is a downstream consequence of that decision,” Giraud explains. Valérian Chambon from the ENS adds that the broader, machine-decoded maps therefore include regions that reflect the outcome or consequences of the perceptual decision rather than the neural computations that produced it.
In other words, many of the regions highlighted by machine-learning decoding are effectively “false positives” with respect to the causal site of syllable categorization: they retain information about which choice the subject made (BA or DA), but they are not necessary for performing the perceptual task itself.

These results clarify how the brain represents speech sounds and highlight a key limitation of machine-learning approaches when used to infer functional organization. Decoding tools can reveal where information about a decision is present, but they may not indicate which regions are performing the underlying computations.
Source: University of Geneva
Publisher: Organized by NeuroscienceNews.com
Image credit: UNIGE (credited by NeuroscienceNews.com)
Original research: Open access research published in PNAS.
DOI: 10.1073/pnas.1714279115
Focal versus distributed temporal cortex activity for speech sound category assignment
Percepts and words can be decoded from distributed neural activity measures. However, the existence of widespread representations might conflict with classical ideas of hierarchical processing and efficient coding, particularly in speech. Using fMRI and magnetoencephalography during syllable identification, the authors show that sensory and decision-related activity colocalize to a restricted portion of the posterior superior temporal gyrus (pSTG). Intracortical recordings reveal that early, focal activity in this region distinguishes correct from incorrect decisions and can be decoded by machine learning to classify syllables. Crucially, significant machine decoding was also possible from activity sampled across other temporal and frontal regions despite weak or absent sensory or decision-related responses there. These findings indicate that speech-sound categorization relies on an efficient readout of focal pSTG signals, while broader, distributed activity patterns—although classifiable by machine learning—primarily reflect collateral perceptual and decision-related processes rather than the core categorization mechanism.