AI Reconstructs Voices from Microscopic Neck Movements

Summary: Researchers have developed a wearable Multiaxial Strain Mapping Sensor that reads microscopic movements of the neck’s skin and muscles and reconstructs speech in real time. This AI-driven, silent-speech system can reproduce a person’s voice even when the vocal cords do not vibrate, offering a new path for voice restoration and noise-immune communication.

By converting tiny, otherwise imperceptible throat movements into audio using computer vision and machine learning, the technology provides a practical solution for people who have lost their natural voice and for environments where conventional microphones fail.

Key Points

Noise-Immune Communication: Because the sensor detects skin strain instead of airborne sound, it works reliably in very loud places—factories, construction sites, and other industrial settings—where microphones struggle.
Restoring Identity: The system pairs decoded throat movement patterns with personalized voice synthesis so it can reproduce an individual’s pre-surgery or pre-illness voice rather than delivering a robotic, generic tone.
Silent Communication: Wearers can communicate without producing audible sound, enabling private or nondisruptive conversations in libraries, theaters, meeting rooms, or sensitive operations.
Everyday Wearability: The device is designed for real-world use: it is comfortable to wear, adjustable to fit different neck anatomies, and robust to body motion and daily activity.

Source: POSTECH

Overview

A team led by Professor Sung-Min Park and Dr. Sunguk Hong at POSTECH (Pohang University of Science and Technology) created a soft, camera-based interface that maps multiaxial strain across the neck’s skin. Published in Cyborg and Bionic Systems, the work demonstrates a practical silent speech interface (SSI) that combines a soft optical strain sensor with deep learning to decode intended speech and reconstruct it with personalized voice synthesis.

This shows a person's neck with glowing lines. — Researchers hope this technology will accelerate the day when patients with speech disorders can reclaim their original voices. Credit: Neuroscience News

Speech is not produced solely by the vocal cords. As we form words, coordinated contractions of neck muscles and shifts in skin tension trace minute deformation patterns across the throat. The research team exploited this “movement map.” Their Multiaxial Strain Mapping Sensor uses a soft silicone patch marked with micromarkers and a tiny embedded camera to track these submillimeter strains continuously and with high sensitivity.

The device is adjustable for placement and tightness, and an automated calibration algorithm corrects baseline shifts when the sensor is reapplied. This combination of mechanical design and software ensures stable performance during everyday use and while the wearer is moving.

Captured strain maps feed into an AI pipeline that decodes intended letters, words, and phrases. The decoded text is then voiced by a personalized text-to-speech model trained on the individual’s vocal characteristics, producing natural-sounding output that resembles the speaker’s original voice. Because the system interprets muscle-driven articulatory intent rather than airborne sound, it works silently and remains effective in noisy surroundings.

Previous approaches to silent speech often relied on electromyography (EMG) or electroencephalography (EEG). While informative, those methods typically require bulky equipment and are less comfortable for long-term wear. The POSTECH solution reduces hardware complexity and improves wearability without sacrificing decoding accuracy, as validated in experiments that included real-world noisy environments.

Potential applications are wide-ranging: restoring natural voice for laryngectomy patients or people with vocal disorders, enabling hands-free communication on noisy industrial floors, and supporting discreet or silent messaging in public or professional settings. The technology also has potential for specialized operational contexts that require low-observability communication.

Funding: This research received support from the Doctoral Course Research Grant Program and the Mid-career Researcher Program of the Ministry of Education, the Bio & Medical Technology Development Program, and the Pioneering Convergence Science and Technology Development Program of the Ministry of Science and ICT.

Key Questions Answered:

Q: Could this device “eavesdrop” on my silent thoughts?

A: No. The system decodes subvocalized speech—that is, intentional muscle movements used when you silently articulate words. It does not read thoughts or mental imagery unrelated to explicit muscular action.

Q: How does this improve on existing electronic larynx devices?

A: Unlike handheld electrolarynges that produce a monotone buzzing sound, this wearable system is hands-free and synthesizes a natural, personalized voice that more closely matches the user’s original vocal identity.

Q: Can it be used for silent communication in public or secure environments?

A: Yes. The technology supports silent, low-observability communication suitable for libraries, conference rooms, industrial sites, or any context where audible speech is impractical or undesirable.

Editorial Notes:

This article was edited by a Neuroscience News editor.
The journal paper was reviewed in full by the editorial staff.
Additional context was added by the editorial team for clarity and accessibility.

About this AI and neurotech research news

Author: Yung-Eui Kang
Source: POSTECH
Contact: Yung-Eui Kang – POSTECH
Image: Image credited to Neuroscience News

Original Research (open access):
Soft Multiaxial Strain Mapping Interface with AI-Driven Decoding for Silent Speech in Noise — Sunguk Hong, Junyoung Yoo, and Sung-Min Park. Cyborg and Bionic Systems. DOI: 10.34133/cbsystems.0536

Abstract (condensed)

Silent speech interfaces (SSIs) provide an alternative to microphones by capturing throat muscle-induced strain patterns and converting them into intelligible speech even in extreme noise. The proposed SSI combines a computer-vision-based optical strain sensor (CVOS)—a soft silicone substrate with micromarkers and a miniature camera—with a deep-learning decoding pipeline and personalized text-to-speech synthesis.

The CVOS captures multiaxial strain maps with high sensitivity and reliability. The inference pipeline includes physics-based automated baseline calibration and content-adaptive temporal attention to handle anatomical variability and dynamic conditions. A personalized voice model then reconstructs natural-sounding speech from the decoded content. Validation in noisy, real-world scenarios confirms the system’s robustness and practical applicability for alphabet-based communication and voice restoration.