New software developed by researchers detects a person’s ability to understand or share feelings in therapy sessions.
“And how does that make you feel?”
Empathy is widely recognized as the cornerstone of effective therapy. A team of researchers from the University of Southern California (USC), the University of Washington and the University of Utah has created software that can automatically detect whether a therapist’s speech signals high or low empathy during counseling sessions.
Using advances in automatic speech recognition (ASR), natural language processing (NLP) and machine learning, the team trained an algorithm on transcripts and audio from clinical sessions. The model evaluates speech patterns, word choices and vocal characteristics to assign an empathy score to each session, distinguishing between high-empathy and low-empathy interactions.
Why this matters
Traditional methods for evaluating therapy quality have changed little in decades. Observational coding by trained human raters remains the standard, but it is time-consuming, expensive and raises privacy concerns because multiple people must review sensitive session recordings. Automated detection offers a scalable alternative that can preserve confidentiality while providing rapid, objective feedback.
The researchers framed their work within behavioral signal processing, an emerging area that applies computational tools to behavioral science questions. Their goal was to determine whether a machine could reliably identify empathy—a subjective, “hidden” mental state—based solely on audio recordings and the words spoken during therapy.
How the system works
The study used a combination of datasets: a core set of Motivational Interviewing (MI) sessions and larger supporting collections of psychotherapy transcripts and recordings. ASR converted audio to text, and text-based models analyzed word use and phrase patterns. Additional speech-processing components evaluate acoustic features such as tone, prosody and conversational rhythm, enabling the algorithm to consider both what was said and how it was said.
Examples of phrases the system associated with high empathy include “it sounds like,” “do you think,” and “what I’m hearing.” Phrases that the model flagged as indicative of lower empathy included “next question,” “you need to,” and routine temporal markers such as “during the past.” Beyond keywords, the algorithm also incorporates cues from diction, voice quality and the way speakers mirror each other’s cadence.
The fully automatic pipeline—running from audio through ASR to prediction—was evaluated against human-coded empathy ratings. The computationally derived empathy scores correlated strongly with human judgments (correlation ≈ 0.65) and produced robust classification performance (F-score ≈ 0.86). Models that used human-generated transcripts showed only modest improvements, indicating the automatic system with ASR is relatively resilient to transcription errors.

Applications and future directions
In the short term, the team envisions using this technology as a training tool for therapists-in-training. Immediate, objective feedback on empathy-related behaviors could accelerate learning and improve the quality of care. As David Atkins, a University of Washington research professor of psychiatry, notes, the ability to assess psychotherapy quality is critical to ensuring patients receive effective treatment.
Zac Imel, a University of Utah professor of educational psychology and the study’s corresponding author, emphasizes that combining engineering and psychological expertise may give clinicians new ways to monitor and improve their practice: the software could help providers receive rapid feedback and thereby enhance therapeutic effectiveness.
Longer-term goals include real-time feedback systems that rate sessions as they occur and expanded models that integrate additional acoustic channels and conversational metrics such as speaking frequency and turn-taking dynamics. Researchers at USC’s Signal Analysis and Interpretation Lab continue to refine models that more deeply analyze prosody, diction and the interactive timing between therapist and patient.
Primary contributors include Shrikanth S. Narayanan (USC), Zac E. Imel (University of Utah), David C. Atkins (University of Washington), Bo Xiao and Panayiotis G. Georgiou (USC). The work appears as “Rate My Therapist: Automated Detection of Empathy in Drug and Alcohol Counseling via Speech and Language Processing” in PLOS ONE, published December 2, 2015. The study reports that automated empathy prediction from audio recordings alone can support large-scale evaluation of psychotherapy for dissemination and process studies.
Source: Amy Blumenthal – USC
Original Research: “Rate My Therapist: Automated Detection of Empathy in Drug and Alcohol Counseling via Speech and Language Processing” by Bo Xiao, Zac E. Imel, Panayiotis G. Georgiou, David C. Atkins, and Shrikanth S. Narayanan. Published online December 2, 2015. DOI: 10.1371/journal.pone.0143055