AI Chatbots Mimic Human Brain Disorder Symptoms

Summary: Researchers report a surprising similarity between how large language models (LLMs) such as ChatGPT process information and how the brains of people with Wernicke’s aphasia operate. In both cases, output can be fluent yet semantically unreliable, suggesting constrained internal dynamics that distort meaning.

Using energy landscape analysis on resting-state brain recordings and internal model activity, the team identified shared patterns of signal flow. These parallels may inform improved diagnostic approaches for aphasia and guide AI engineers toward more robust LLM designs that reduce misleading or incoherent responses.

Key Facts:

Cognitive parallel: Both LLMs and some aphasia patients produce fluent but often unreliable language.
Shared dynamics: Energy landscape analysis reveals similar transition and dwelling patterns in brain activity and LLM internal states.
Potential impact: Findings could refine clinical classification of aphasia and suggest internal diagnostics for AI improvement.

Source: University of Tokyo

Context: Agents, chatbots and other tools based on artificial intelligence are increasingly integrated into everyday life. Large language models (LLMs) like ChatGPT and Llama generate impressively fluent text but can also produce convincing yet incorrect statements—sometimes called hallucinations.

Researchers at the University of Tokyo investigated whether this behavior resembles a human language disorder called aphasia, particularly Wernicke’s (receptive) aphasia, in which individuals speak with normal fluency but produce content that is hard to understand or lacks clear meaning.

This shows a brain and neurons. — They examined patterns in resting brain activity from people with different types of aphasia and compared them to internal data from several publicly available LLMs. Credit: Neuroscience News

The researchers suggest that identifying common internal dynamics could open new diagnostic possibilities for aphasia and offer engineers insights for improving model reliability and interpretability.

While this article itself was written by a human, the use of text-generating AI is growing across many domains. As reliance on these systems increases, ensuring they provide accurate, coherent information becomes more important—especially for users who may not have the domain knowledge to detect errors in confident-sounding responses.

Professor Takamitsu Watanabe of the International Research Center for Neurointelligence (WPI-IRCN) at the University of Tokyo highlighted the resemblance: “You can’t fail to notice how some AI systems can appear articulate while still producing often significant errors,” he said. “What struck my team and I was a similarity between this behavior and that of people with Wernicke’s aphasia, where such people speak fluently but don’t always make much sense. That prompted us to wonder if the internal mechanisms of these AI systems could be similar to those of the human brain affected by aphasia, and if so, what the implications might be.”

To test this idea, the team applied energy landscape analysis—a technique adapted from physics to visualize how a system’s activity moves among different states—to both human brain data and LLM internal activity. They examined resting brain activity from people with different aphasia types and compared it to internal representations from several publicly available LLMs.

Energy landscapes can be thought of as a surface on which a ball rolls: steep wells trap the ball in stable states, while shallow regions allow it to wander. In the brain, those wells correspond to recurring activity patterns; in LLMs, they represent persistent signal patterns driven by the model’s architecture and training data. “In aphasia, the ball represents the person’s brain state. In LLMs, it represents the continuing signal pattern in the model based on its instructions and internal dataset,” Watanabe explained.

The analysis measured two key properties: transition frequency (how often activity moves between states) and dwelling time (how long the system remains in a state). Distinct distributions of these measures helped separate receptive aphasia, expressive aphasia and typical controls: receptive aphasia tended to show bimodal distributions for both indices, expressive aphasia showed more uniform distributions, and controls fell between these patterns.

Notably, several LLMs showed highly polarized (bimodal-like) distributions of transition frequency and dwelling time, resembling the patterns seen in receptive aphasia. The authors interpret this as indicating similar constraints on internal information flow—systems that are fluent in output but limited in flexible, accurate retrieval of relevant knowledge.

The study’s implications are twofold. For neuroscience, energy landscape measures could provide a complementary way to classify and monitor aphasia by examining internal brain dynamics rather than relying solely on external symptoms. For AI development, analogous diagnostics might reveal architectural bottlenecks and suggest paths to reduce misleading or rigid output.

The researchers emphasize caution: they are not equating chatbots with brain-damaged people. “We’re not saying chatbots have brain damage,” Watanabe noted. “But they may be locked into a kind of rigid internal pattern that limits how flexibly they can draw on stored knowledge, just like in receptive aphasia. Whether future models can overcome this limitation remains to be seen, but understanding these internal parallels may be the first step toward smarter, more trustworthy AI too.”

Funding: This work was supported by Grants-in-Aid for Research Activity from the Japan Society for the Promotion of Science (19H03535, 21H05679, 23H04217, JP20H05921), The University of Tokyo Excellent Young Researcher Project, Showa University Medical Institute of Developmental Disabilities Research, JST Moonshot R&D Program (JPMJMS2021), JST FOREST Program (24012854), Institute of AI and Beyond of UTokyo, and the Cross-ministerial Strategic Innovation Promotion Program (SIP) on “Integrated Health Care System” (JPJ012425).

About this AI and aphasia research news

Author: Rohan Mehra
Source: University of Tokyo
Contact: Rohan Mehra – University of Tokyo
Image: The image is credited to Neuroscience News

Original Research: Open access.
“Comparison of large language model with aphasia” by Takamitsu Watanabe et al., Advanced Science

Abstract

Comparison of large language model with aphasia

Large language models (LLMs) often produce fluent but inaccurate output, a behavioral pattern that resembles certain forms of aphasia in humans. This study asks whether that surface similarity reflects comparable internal information-processing dynamics.

We compared network dynamics in LLMs—ALBERT, GPT-2, Llama-3.1 and a Japanese Llama variant—with those observed in various aphasic brains. Using energy landscape analysis, we quantified transition frequency (how often network activity moves between states) and dwelling time (how long activity persists in a given state).

Analysis of the frequency spectra for these indices in brain data showed that polarization in transition frequency and dwelling time distinguishes receptive aphasia, expressive aphasia and controls: receptive aphasia exhibited bimodal distributions for both measures, while expressive aphasia showed more uniform distributions. In parallel, the four LLMs displayed highly polarized distributions in these network dynamics.

These results reveal a similarity in internal information processing between current LLMs and receptive aphasia, and suggest the approach could serve as a novel tool for classifying and diagnosing both language disorders and LLM behavior, with potential to guide improvements in AI performance.