How AI Detects Depression in Children's Speech

Summary: A machine learning model that analyzes children’s speech was able to identify young children diagnosed with anxiety and depression with about 80% accuracy. The system singled out eight acoustic features associated with higher risk; notably, lower voice pitch, repeated speech inflections, and a higher-pitched reaction to a surprising buzzer were particularly indicative of internalizing disorders. Researchers plan to develop a practical screening tool—potentially a smartphone app—that records and analyzes brief speech samples to help detect children at risk for anxiety and depression earlier and more reliably.

Source: University of Vermont

A new study published in the Journal of Biomedical and Health Informatics finds that machine learning applied to brief speech samples can detect signs of anxiety and depression in preschool and early school-age children, offering a fast and objective supplement to traditional diagnostic methods.

Internalizing disorders—primarily anxiety and depression—affect roughly one in five children, yet these conditions are often missed in early childhood. Children under eight frequently cannot describe their emotional experiences consistently, so caregivers and clinicians must infer mental health from behavior and reported symptoms. Long wait times for specialists, insurance barriers, and lack of symptom recognition by parents contribute to delayed or missed diagnosis and treatment.

“We need quick, objective tests to catch kids when they are suffering,” says Ellen McGinnis, a clinical psychologist at the University of Vermont Medical Center’s Vermont Center for Children, Youth and Families and lead author of the study. Early identification is critical because treatments are more effective while a child’s brain is still developing; untreated internalizing disorders can increase the risk of substance misuse and suicidal behavior later in life.

Current clinical diagnosis typically involves a lengthy semi-structured interview lasting 60–90 minutes with a trained clinician and a primary caregiver. To find a faster, scalable alternative, the research team led by Ellen McGinnis and biomedical engineer Ryan McGinnis tested whether an auditory-based machine learning approach could reliably flag children with internalizing disorders using just a brief, stress-inducing speech task.

Seventy-one children ages three to eight participated in an adapted mood-induction protocol based on the Trier Social Stress Task. Each child was asked to improvise a three-minute story and was told the story would be judged for interest. A researcher playing the judge maintained a neutral or stern demeanor and offered neutral or negative feedback. A buzzer sounded at 90 seconds and again with 30 seconds remaining to create mild, surprising interruptions intended to elicit stress-related vocal responses.

“The task is designed to be stressful, and to put them in the mindset that someone was judging them,” says Ellen McGinnis.

Each child was also assessed using a structured clinical interview and validated parent-report questionnaires—standard tools for identifying pediatric internalizing disorders. Audio recordings of the storytelling task were then processed to extract statistical and acoustic features. A machine learning classifier was trained to relate these features to clinical diagnoses.

The model performed well: it identified children with an internalizing disorder with approximately 80% overall accuracy. The middle portion of the recording—between the two buzzer interruptions—proved most predictive. Compared with parent-reported symptom checklists in this sample, the speech-based classifier achieved higher overall accuracy and provided results in seconds after recording, offering a rapid, objective assessment option.

Eight acoustic features emerged as informative for distinguishing affected children. Three features stood out most strongly: a lower average pitch (perceived as a flatter or more monotone voice), repetitive speech inflections and repeated content, and an elevated pitch response immediately following the surprising buzzer. According to the researchers, these vocal characteristics align with clinical expectations for depressive and anxious presentation in young children—monotone speech and repetitive phrasing—and with heightened startle or reactivity to sudden stimuli.

Earlier work from the same group found similar behavioral markers using motion analysis during a fear-induction task. While motion-based measures can be accurate, they require specialized equipment (darkened room, motion sensors, guided setup). By contrast, the speech task is simple to administer: a judge, a voice recorder, and a buzzer are sufficient. This simplicity increases feasibility for clinical and community settings.

This shows a depressed looking little girl. — A machine learning algorithm can detect signs of anxiety and depression in the speech patterns of young children, potentially providing a fast and easy way of diagnosing conditions that are difficult to spot and often overlooked in young people. Image credit: Anthony Kelly.

Looking ahead, the researchers plan to refine the speech analysis into a practical universal screening tool. A smartphone application that records a short story prompt and analyzes the audio immediately could enable routine screening in pediatric clinics, schools, or even at home—helping to identify children at risk before problems escalate or before parents recognize symptoms themselves. The team also envisions combining voice analysis with other technology-assisted measures, such as motion analysis, to create a comprehensive, scalable diagnostic battery.

Study authors include Ellen W. McGinnis, Steven P. Anderau, Jessica Hruschak, Reed D. Gurchiek, Nestor L. Lopez-Duran, Kate Fitzgerald, Katherine L. Rosenblum, Maria Muzik, and Ryan McGinnis. The study is reported as: “Giving Voice to Vulnerable Children: Machine Learning Analysis of Speech Detects Anxiety and Depression in Early Childhood” in the Journal of Biomedical and Health Informatics.

About this neuroscience research article

Source:
University of Vermont
Media Contacts:
Jeff Wakefield – University of Vermont
Image Source:
Image credited to Anthony Kelly.

Original Research: “Giving Voice to Vulnerable Children: Machine Learning Analysis of Speech Detects Anxiety and Depression in Early Childhood.” Ellen W. McGinnis et al., Journal of Biomedical and Health Informatics. The study reports that speech features alone identified children with an internalizing disorder at 80% accuracy (with reported sensitivity and specificity in the original manuscript).

Feel free to share this Neuroscience News.