Improving Voice Recognition for People with Speech Impairments

Summary: A new study shows that automatic speech recognition (ASR) systems trained on recordings from people with Parkinson’s disease can transcribe similar atypical speech patterns about 30% more accurately. Researchers collected more than 151 hours of speech from participants with varying degrees of dysarthria—a motor speech disorder common in Parkinson’s disease—and used those samples to fine-tune ASR models to improve performance for users with neuromotor speech impairments.

The findings demonstrate that including diverse, atypical speech in training datasets substantially improves voice recognition for people with speech disabilities. This work could make voice-controlled devices and assistive technologies far more accessible and reliable for individuals with Parkinson’s and related conditions.

Key facts:

ASR systems fine-tuned on Parkinson’s-related speech achieved roughly 30% better transcription accuracy than systems trained only on typical speech.
Researchers compiled over 151 hours of recordings from people with dysarthria associated with Parkinson’s disease.
Results suggest improved accessibility for users with neuromotor speech disorders, and the dataset is being shared to accelerate research.

Source: Beckman Institute

Lead author Mark Hasegawa-Johnson found unexpected human interest while reviewing the recordings: among hundreds of hours of speech, participants shared everything from everyday instructions to recipes like Eggs Florentine.

Hasegawa-Johnson directs the Speech Accessibility Project at the University of Illinois Urbana-Champaign, housed in the Beckman Institute for Advanced Science and Technology. The project’s goal is to make voice recognition technology more usable for people with diverse speech patterns and disabilities.

In the study’s primary experiment, researchers trained an automatic speech recognizer on roughly 151 hours of recorded speech from people with Parkinson’s-related dysarthria. When evaluated on a separate test set of similar recordings, the fine-tuned ASR reduced the word error rate substantially compared with a baseline model trained only on typical speech.

This shows a person's head. — The team consulted with Parkinson’s disease experts and community members to develop prompts relevant to participants’ lives. Credit: Neuroscience News

The study appears in the Journal of Speech, Language, and Hearing Research. The Speech Accessibility Project has made the speech recordings and annotations available to qualified researchers, nonprofit organizations, and companies working to improve voice recognition accessibility.

“Our results indicate that a large database of atypical speech can meaningfully improve speech technology for people with disabilities,” said Hasegawa-Johnson, a professor of electrical and computer engineering and a researcher at the Beckman Institute. “We hope other organizations will use this data to make voice-controlled devices more inclusive.”

Automatic speech recognition powers many everyday tools—smartphones, virtual assistants, transcription services and hands-free communication tools. However, ASR systems often underperform for people with neuromotor disorders, whose speech can be strained, slurred, slow, fast, or otherwise atypical. These patterns, collectively known as dysarthria, make reliable voice interaction difficult for those who could benefit most from these technologies.

“Many people who might gain the most from voice-controlled devices encounter the greatest difficulty using them,” Hasegawa-Johnson said. “We asked whether exposing an ASR system to speech from people with dysarthria could teach it to recognize those patterns better.”

The research team recruited about 250 adults with Parkinson’s-related dysarthria. A speech-language pathologist screened prospective participants to determine eligibility and ensure a representative range of speech patterns. Selected participants recorded speech from their homes using personal computers, smartphones, or assistive devices. Tasks included repeating common voice commands, reading passages, and responding to open-ended prompts such as “Please explain the steps to making breakfast for four people.”

Participation proved meaningful for many volunteers. Clarion Mendes, a speech-language pathologist on the project, noted that some participants felt reconnected to everyday communication: “This project has brought hope, excitement and energy to many participants and their loved ones.” Some responses were practical and concise—others, like the detailed Eggs Florentine recipe, were richly personal.

Content was developed in consultation with Parkinson’s specialists and community members to reflect real-life needs: medication names, conversational starters and practical commands that mirror how people actually use voice technology. Participants were encouraged to speak naturally rather than straining to make their speech artificially clear.

To measure learning, the dataset was split into training, development, and test sets. The training set included 190 participants and about 151 recorded hours. The model was fine-tuned on that data, validated on a development set, and finally evaluated on a reserved test set. Human transcribers checked the system’s outputs by manually transcribing hundreds of recordings per participant to verify accuracy.

Results showed the fine-tuned ASR reached a word error rate (WER) of about 23.7% on the test set, compared with a 36.3% WER for a system trained only on typical speech. Error rates improved for nearly all individual speakers in the test set, including some with atypical patterns such as very rapid speech or stuttering.

“The magnitude of the benefit was striking,” Hasegawa-Johnson said. Participant feedback reinforced the impact: volunteers expressed excitement at the prospect that their phones and smart speakers might better understand them in the future.

Funding: The research received support from five major technology companies—Amazon, Apple, Google, Meta and Microsoft—as well as grants from the National Institute on Deafness and Other Communication Disorders (NIH award R13DC003383) and the National Science Foundation (award 1725729). The findings and conclusions are those of the authors and do not necessarily reflect the official views of the NIH.

About the Speech Accessibility Project

The Speech Accessibility Project is a multi-institutional initiative based at the Beckman Institute, University of Illinois Urbana-Champaign, launched to improve voice recognition for people with diverse speech patterns and communication disabilities. The project collects, curates, and distributes transcribed U.S. English speech from individuals with conditions such as Parkinson’s disease, Down syndrome, cerebral palsy, amyotrophic lateral sclerosis (ALS), and post-stroke speech impairment. Participants record speech from home, and all samples are manually transcribed and annotated to support ASR and related machine learning research.

As of mid-2024 the project had shared hundreds of thousands of speech samples with partner organizations to accelerate the development of more inclusive speech technologies.

Conducting research with the Speech Accessibility Project

The SAP released a data package containing approximately 170 hours of annotated speech from 211 participants with dysarthria related to Parkinson’s disease, available to researchers and organizations conducting accessibility-focused ASR research. The project accepts proposals from academic labs, nonprofits, and companies interested in using these recordings to improve voice recognition for people with speech disabilities.

About this research and reporting

Author: Jenna Kurtzweil
Source: Beckman Institute
Contact: Jenna Kurtzweil – Beckman Institute
Image: Image credited to Neuroscience News

Original research: Open access. “Community-Supported Shared Infrastructure in Support of Speech Accessibility” by Mark Hasegawa-Johnson et al., Journal of Speech, Language, and Hearing Research.

Abstract

Community-Supported Shared Infrastructure in Support of Speech Accessibility

Purpose:

The Speech Accessibility Project (SAP) aims to accelerate research and development in automatic speech recognition and related machine learning applications for people with speech disabilities. This article introduces the SAP resource and presents baseline analysis from the first public data release.

Method:

SAP collects, curates, and distributes transcribed U.S. English speech from people with speech and language disabilities. Participants record at home using personal devices. All samples are manually transcribed, and a subset of recordings per participant is annotated along diagnostic dimensions. Participants were randomly assigned to training, development, and test sets to enable controlled ASR experiments and evaluation of error rates.

Results:

The 2023-10-05 SAP data package includes speech from 211 people with dysarthria related to Parkinson’s disease, plus an additional 42 speakers in the test set. A baseline ASR that achieves a low error rate for typical speech transcribed the dysarthric test speech with a substantially higher word error rate; fine-tuning the model on dysarthric speech reduced that error rate meaningfully.

Conclusions:

Preliminary results indicate that a large corpus of dysarthric and dysphonic speech can significantly enhance speech technology for people with disabilities. By sharing these data and annotations, the Speech Accessibility Project seeks to accelerate the development of more inclusive and effective voice recognition systems.