AI Decodes Microscopic Neck Movements to Restore Speech

Summary: Researchers have created a wearable Multiaxial Strain Mapping Sensor that detects microscopic movements of the neck’s muscles and skin and reconstructs speech in real time. The system, driven by AI, can reproduce a user’s voice without any vibration of the vocal cords, offering a new option for people who have lost their natural voice due to illness or surgery.

This technology decodes silent or subvocalized speech by reading skin strain rather than sound waves, enabling communication even when no audible voice is produced.

Key Points

Noise-Resilient Communication: Because the sensor measures skin movement instead of acoustic signals, it performs reliably in extremely noisy environments—such as factories and construction sites—where traditional microphones struggle.
Voice Restoration: For people who have undergone laryngeal surgery or lost vocal function, the system can synthesize a natural-sounding voice modeled on the individual’s existing vocal characteristics, rather than producing a robotic tone.
Silent Speech Capability: The interface allows clear, private communication in quiet public settings (libraries, theaters) or confidential operations, enabling users to convey complex instructions without producing sound.
Practical Wearability: The device is designed for everyday use. Its soft silicone interface and adjustable fit, combined with algorithms that correct placement changes, make the system robust during movement and in demanding work conditions.

Source: POSTECH

Overview: A research team led by Professor Sung-Min Park and Dr. Sunguk Hong at POSTECH (Pohang University of Science and Technology) developed a soft, camera-based skin-strain sensor and paired it with AI-driven decoding to convert throat muscle movements into intelligible, personalized speech.

This shows a person's neck with glowing lines. — Researchers hope this technology will accelerate the day when patients with speech disorders can reclaim their original voices. Credit: Neuroscience News

Their results were published in the online edition of Cyborg and Bionic Systems, a Science Partner Journal focused on biomedical engineering.

The system is built on a simple observation: speaking produces subtle, coordinated movements of the neck’s muscles and skin that trace a characteristic “movement map.” These minute strain patterns carry information about the intended speech, even when sound is not produced.

To capture those patterns, the team created a Multiaxial Strain Mapping Sensor combining a small camera with micromarkers embedded in a soft silicone substrate. Worn on the neck, the sensor detects tiny skin deformations and generates multiaxial strain maps that reflect the user’s articulatory actions.

The sensor’s fit and position are adjustable for individual anatomy. A physics-based calibration routine and automated alignment algorithm correct baseline shifts when the device is reapplied, ensuring stable performance in real-world use.

Captured strain patterns are fed into a deep-learning pipeline that decodes intended letters, words, or phrases. The decoded text is then rendered into speech by a personalized text-to-speech model trained on the user’s vocal features, producing a natural-sounding voice even when the vocal folds remain silent.

Previous approaches to silent speech relied on signals such as electromyography (EMG) or electroencephalography (EEG), which often require bulky equipment and uncomfortable electrodes. The POSTECH team’s soft CVOS (computer vision-based optical strain) approach improves wearability and achieves high sensitivity and reliability, making it more suitable for everyday environments.

Through experiments in noisy, realistic scenarios, the researchers demonstrated robust speech reconstruction, indicating the system’s potential for diverse real-world applications: assistive devices for patients after laryngeal surgery, hands-free communication in industrial settings without radio or microphone access, and private silent communication in public spaces.

Professor Sung-Min Park commented, “We hope this technology helps accelerate the day patients with speech disorders can reclaim their voices. It has broad potential—from assisting laryngectomized patients to enabling reliable communication in noisy work environments and facilitating silent conversations.”

Funding: This work was supported by the Doctoral Course Research Grant Program, the Mid-career Researcher Program of the Ministry of Education, the Bio & Medical Technology Development Program, and the Pioneering Convergence Science and Technology Development Program of the Ministry of Science and ICT.

Key Questions Answered:

Q: Could someone “eavesdrop” on my silent thoughts?

A: No. The sensor decodes subvocalized speech only when a user makes physical articulatory movements in the neck. It detects muscle and skin strain associated with spoken intent, not internal thoughts.

Q: How does this differ from an electronic larynx?

A: Electronic larynx devices generate a monotonous buzz and require manual placement. The Multiaxial Strain Mapping Sensor is wearable and hands-free, and it can synthesize a natural voice tailored to the individual rather than producing a robotic sound.

Q: Is it practical for secret or quiet communication?

A: Yes. One intended use is silent communication where audible speech is inappropriate or impossible—for example, in libraries, theaters, or noisy industrial settings where microphones cannot be relied upon.

Editorial Notes:

This article was edited by a Neuroscience News editor.
The journal paper was reviewed in full by our staff.
Additional context was added for clarity.

About this AI and neurotech research news

Author: Yung-Eui Kang
Source: POSTECH
Contact: Yung-Eui Kang – POSTECH
Image: The image is credited to Neuroscience News

Original Research: Open access.
“Soft Multiaxial Strain Mapping Interface with AI-Driven Decoding for Silent Speech in Noise” by Sunguk Hong, Junyoung Yoo, and Sung-Min Park. Cyborg and Bionic Systems
DOI:10.34133/cbsystems.0536

Abstract

Soft Multiaxial Strain Mapping Interface with AI-Driven Decoding for Silent Speech in Noise

Silent speech interfaces (SSIs) provide an alternative to conventional microphones for clear communication in noisy settings. This study presents an SSI that reconstructs voice by monitoring continuous multiaxial strain maps produced by throat muscle activity.

The system integrates a computer vision-based optical strain (CVOS) sensor with deep learning voice reconstruction to enable reliable alphabetic and word-level communication under extreme noise. The CVOS sensor—built from a soft silicone substrate with micromarkers and a miniature camera—captures high-sensitivity marker displacements and complex strain patterns with improved scalability and reliability over many wearable sensor approaches.

The processing pipeline includes physics-based automated baseline calibration and a content-adaptive temporal attention mechanism to robustly analyze captured strain signals. A personalized text-to-speech module then reconstructs the speaker’s voice based on decoded content and learned vocal characteristics. Real-time adaptive processing compensates for anatomical differences across and within users, supporting robust use during natural movement.

Combined algorithmic design and interface engineering enable accurate alphabet-based communication. Validation in real-world noisy environments confirms the approach’s practical applicability for assistive voice restoration and resilient silent speech communication in industrial or quiet public settings.