AI-Generated Voices Fool People While Brain Activity Differs

Summary: People struggle to reliably tell human voices from AI-generated voices, correctly identifying them roughly half the time. Yet brain scans show distinct neural patterns: human voices engage regions tied to memory and empathy, while AI voices activate areas involved in error detection and focused attention.

These results underscore both the sophistication of modern AI voice synthesis and the subtle ways our brains differentiate synthetic speech from human speech. Ongoing research will examine how personality traits influence a person’s ability to recognize voice origin.

Key Facts:

  1. Identification accuracy: Participants correctly identified human voices 56% of the time and AI voices 50.5% of the time.
  2. Distinct neural responses: Human voices produced stronger activity in brain areas linked to memory and empathy, while AI voices produced greater activation in regions associated with error monitoring and attention control.
  3. Perception bias by emotion: Neutral voices were often judged to be AI, whereas happy voices were more likely to be perceived as human.

Source: FENS

Overview of the study

Researchers presenting at the Federation of European Neuroscience Societies (FENS) Forum 2024 report that people find it difficult to distinguish modern AI-generated voices from real human voices, but the brain responds differently to the two. The study was led by doctoral researcher Christine Skjegstad and Professor Sascha Frühholz from the Department of Psychology at the University of Oslo (UiO), Norway.

Skjegstad notes that recent advances in machine learning allow accurate voice cloning from only a few seconds of audio, a capability already exploited by scammers. While technical methods for detecting synthetic voices are under development, comparatively little was known about how the human brain reacts to AI-generated speech.

This shows the outline of two heads.
For happy human voices, the correct identification rate was 78%, compared to only 32% for happy AI voices, suggesting that people associate happiness as more human-like. Credit: Neuroscience News

The experiment recruited 43 participants who listened to recordings of human and AI-generated voices expressing five emotions: neutral, angry, fearful, happy, and pleasure. While participants judged each clip as human or synthetic, researchers recorded brain activity using functional magnetic resonance imaging (fMRI). Participants also rated each voice for perceived naturalness, trustworthiness, and authenticity.

Overall identification performance was near chance. Participants correctly labeled human voices 56% of the time and AI voices 50.5% of the time, indicating similar difficulty in identifying either category. Neutral AI voices were the easiest to identify as synthetic: 75% of participants classified neutral AI voices correctly, while only 23% recognized neutral human voices as human. Happy human voices were most often recognized as human (78%), whereas happy AI voices were identified as AI only 32% of the time, showing a bias that links positive emotion to human origin.

Ratings showed both neutral human and neutral AI voices scored lowest for naturalness, trustworthiness, and authenticity. Human happy voices scored highest on those qualities. Female neutral AI voices were more frequently identified correctly than male neutral AI voices, a pattern the authors suggest may reflect public familiarity with female-voiced digital assistants.

Despite the low behavioral accuracy, fMRI results revealed clear differences in neural activation. Human voices produced stronger responses in regions associated with memory (right hippocampus) and social understanding or empathy (right inferior frontal gyrus). In contrast, AI voices increased activity in areas linked to error detection (right anterior mid cingulate cortex) and top-down attention control (right dorsolateral prefrontal cortex), suggesting listeners may become more vigilant or evaluative when hearing synthetic speech.

Skjegstad commented that participants frequently reported difficulty in deciding whether a voice was human or synthetic, reinforcing the idea that current AI voice systems can emulate human characteristics closely. Still, the distinct brain responses imply different subjective processing: AI voices may heighten alertness, while human voices may promote a sense of connection.

Next steps for the research team include investigating whether individual differences—such as extraversion, empathy, or other personality traits—affect a person’s sensitivity to identifying human versus AI voices. Understanding these factors could inform both technology design and public awareness strategies.

Professor Richard Roche, chair of the FENS Forum communication committee and Deputy Head of the Department of Psychology at Maynooth University (not involved in the study), emphasized the importance of studying neural responses to AI voices as synthesis technologies grow more capable. He noted both risks, such as fraud and deception, and benefits, including voice restoration for people who have lost their natural voice and potential therapeutic uses in mental health care.

About this AI and neuroscience research news

Author: Kerry Noble
Source: FENS
Contact: Kerry Noble – FENS
Image: Image credit: Neuroscience News

Original Research: Findings presented at FENS Forum 2024