Decoding Spoken Sentences from Brain Signals

Spoken sentences reconstructed from human brain surface activity: “Brain-to-Text” merges neuroscience, medicine, and informatics

Researchers have demonstrated that it is possible to reconstruct basic speech sounds, individual words, and entire sentences from neural activity recorded at the surface of the human cerebral cortex. The system, called Brain-to-Text, integrates methods from neuroscience, clinical medicine, and computer science to transform brain signals associated with spoken language into written text. The findings are reported by a collaborative team from Karlsruhe Institute of Technology (KIT) and the Wadsworth Center, USA, in the journal Frontiers in Neuroscience.

Speech production in humans relies on coordinated activity across cortical areas. Electrocorticography (ECoG), which records electrical activity directly from the cortical surface via electrode arrays, captures high-resolution temporal and spatial patterns linked to speech processes. Using these recordings, the Brain-to-Text system models phonetic building blocks and uses automatic speech recognition (ASR) methods and machine learning to infer the most probable sequence of words that were spoken.

“The prospect of communicating with machines directly through brain activity has long been discussed,” says Tanja Schultz, who led the project at KIT’s Cognitive Systems Lab. “Our results show that both basic units of speech—phones—and continuously spoken sentences can be recognized from cortical activity, marking a substantial step toward brain-driven communication interfaces.”

Brain activity recorded by electrocorticography (blue circles). From the activity patterns (blue/yellow), spoken words can be recognized. Image credit: The researchers.

The data used to develop Brain-to-Text were recorded from seven patients undergoing clinical treatment for epilepsy in the United States. As part of their clinical care, each patient had an ECoG electrode array temporarily placed on the cortical surface. While the patients read sample texts aloud, researchers recorded their cortical signals with fine temporal and spatial resolution. Those recordings were later analyzed by the team in Karlsruhe to design and train decoding models that link measured neural patterns to spoken phones and words.

The Brain-to-Text approach combines cortical signal decoding with language modeling: neural patterns are first mapped to likely phonetic elements, and then statistical models of language assemble these elements into coherent word sequences. This hybrid strategy leverages advances in automatic speech recognition and modern machine learning to narrow down the most plausible textual representation of the recorded utterances.

Importantly, the current system decodes audible speech recorded during overt vocalization. The authors emphasize, however, that this work represents an important foundation for future efforts aimed at decoding imagined or inner speech. Translating non-verbalized thought into text remains a major challenge, but Brain-to-Text establishes key methods and demonstrates that meaningful speech content can be recovered from cortical activity patterns.

About this neuroscience research

The research is the result of interdisciplinary collaboration between experts in informatics, neuroscience, and clinical neurology. In Karlsruhe, signal processing and automatic speech recognition techniques were developed and adapted for neural data. Christian Herff and Dominic Heger, who implemented the Brain-to-Text system during their doctoral research, note that the models also enable detailed investigations into which cortical regions carry information about individual phones and how those areas interact during speech production.

Beyond advancing basic science, Brain-to-Text may inform assistive technologies for individuals with severe motor impairments. For people who are locked-in or otherwise unable to speak, systems that decode intended speech from cortical activity could one day provide a direct means of communication. These clinical applications will require further development, noninvasive adaptations, and extensive validation.

Key performance and findings

The study reports that the Brain-to-Text system achieved word error rates as low as 25% and phone error rates below 50% on the experimental data. Additionally, the analysis identified specific cortical regions that contain substantial information about individual phones, contributing to the broader understanding of the neural basis of continuous speech production.

Publication details

The full open-access study, titled “Brain-to-text: decoding spoken phrases from phone representations in the brain,” was authored by Christian Herff, Dominic Heger, Adriana de Pesters, Dominic Telaar, Peter Brunner, Gerwin Schalk, and Tanja Schultz and was published in Frontiers in Neuroscience on June 12, 2015 (doi:10.3389/fnins.2015.00217). The work is presented as a proof of concept that spoken phrases can be decoded from intracranial cortical recordings by combining neural decoding with language models.

Abstract (condensed)

This study demonstrates for the first time that continuously spoken speech can be decoded into text from intracranial electrocorticographic recordings. The Brain-to-Text system models individual phones, applies techniques from automatic speech recognition, and converts brain activity during speech into corresponding textual representations. Results show promising error rates and identify cortical regions informative about phones, representing an encouraging step toward imagined-speech communication interfaces.

Feel free to share this neuroscience news.