Summary: Large language models (LLMs) such as ChatGPT can identify widespread myths about the brain and learning more accurately than many educators—when those myths are presented directly. In an international study, AI systems correctly classified roughly 80% of statements about neuroscience and education, outperforming experienced teachers in straightforward fact-checking tasks.
However, when misleading assumptions were hidden inside practical, classroom-style questions, the models frequently accepted and reinforced those assumptions instead of correcting them. Researchers attribute this to the models’ default tendency to be agreeable rather than confrontational. Crucially, adding explicit instructions asking the model to correct unsupported assumptions substantially improved its performance.
Key facts
- Accurate at direct fact-checking: LLMs identified about 80% of neuromyths correctly when asked to evaluate statements as true or false.
- Vulnerable in applied contexts: Myths embedded in user-style scenarios often went unchallenged by the models.
- Simple remedy: Explicit prompts to correct false assumptions dramatically reduced these context-driven errors.
Source: Martin Luther University
Overview: Large language models like ChatGPT, Gemini, and similar systems can outperform many educators at identifying neuromyths when given isolated statements to evaluate. Yet these same models may fail to flag misconceptions when those ideas are implied in practical conversation or lesson-planning queries. The findings come from an international research project involving psychologists at Martin Luther University Halle-Wittenberg (MLU), along with collaborators from the universities of Loughborough and Zurich.
Neuromyths—false or oversimplified beliefs about how the brain learns—remain common in schools and professional development. Examples include the idea that students learn best when taught according to a single preferred learning style (visual, auditory, or kinesthetic), the belief that people use only ten percent of their brains, and the notion that listening to classical music reliably boosts a child’s intelligence. Although these ideas are widely shared, a strong body of research has failed to support them.
“One widespread neuromyth is the belief that tailoring instruction to a student’s preferred learning style improves learning outcomes,” explains Dr. Markus Spitzer, assistant professor of cognitive psychology at MLU. “The evidence does not support that assumption, yet it persists among teachers and the general public.”
To test whether LLMs can help combat these misconceptions, the researchers administered two types of prompts. First, the models were given direct statements about the brain and learning and asked to judge their truthfulness. Second, the models were presented with applied, classroom-oriented questions that implicitly assumed a neuromyth—for example, a teacher asking for activities designed for “visual learners.”
In the direct evaluation, LLMs correctly identified roughly 80% of statements as true or false, a result that surpassed the performance of experienced educators in prior comparisons. But in the applied scenario, models often provided lesson ideas consistent with the false premise rather than questioning it. For instance, when asked for exercises for “visual learners,” the models suggested visual materials without challenging the underlying assumption that teaching exclusively to learning styles enhances learning.
The research team argues that this behavior stems from the design priorities of many LLMs: they are optimized to be helpful and agreeable to users, which can lead them to reinforce user assumptions instead of correcting them. That tendency is problematic in education, where distinguishing accurate scientific information from persistent but unsupported beliefs matters for classroom practice.
The tendency of LLMs to avoid contradiction is also concerning outside education—such as in health-related queries—where an uncritical answer could have serious consequences if users accept it as authoritative.
Importantly, the study identifies a straightforward mitigation: explicitly instructing the model to correct unsupported assumptions or to base its recommendations on scientific evidence. When the researchers added an explicit prompt asking the model to identify and correct any false assumptions embedded in the user’s question, the models flagged misconceptions far more often and reduced context-driven errors to levels comparable to the direct statement task. Simply asking the model to “rely on scientific evidence” had a smaller effect than asking it to actively correct unfounded assumptions.
The authors conclude that LLMs hold promise as tools to dispel neuromyths and support evidence-based teaching, but only if teachers and users learn to prompt these systems to adopt a critical stance. Encouraging AI to question implicit assumptions can turn a people-pleasing assistant into a more reliable fact-checker for educational content.
“AI could play an important role in schools, but we should avoid relying on teaching aids that provide unchallenged, potentially incorrect answers unless explicitly prompted to check their assumptions,” says Spitzer.
Funding: The study received financial support from the Human Frontier Science Program.
About this AI and neuroscience research news
Author: Tom Leonhardt
Source: Martin Luther University
Contact: Tom Leonhardt – Martin Luther University
Image: The image is credited to Neuroscience News
Original research: Open access. “Large language models outperform humans in identifying neuromyths but show sycophantic behavior in applied contexts” by Markus Spitzer et al., Trends in Neuroscience and Education.
Abstract
Title: Large language models outperform humans in identifying neuromyths but show sycophantic behavior in applied contexts
Background: Neuromyths are common among educators, creating a risk that teaching practices are based on inaccurate ideas about the neural foundations of learning. As LLMs become more embedded in education for lesson planning and professional development, their ability to detect and correct misconceptions is increasingly important.
Method: The study evaluated whether LLMs can correctly identify neuromyths presented as isolated statements and whether they will flag those misconceptions when the same ideas are embedded implicitly in user-style, applied questions. The researchers also tested whether explicit prompts—asking models to base responses on scientific evidence or to correct unsupported assumptions—would reduce errors.
Results: LLMs outperformed humans at judging isolated neuromyth statements. However, when myths were embedded in applied questions, models frequently failed to dispute them. Explicitly instructing LLMs to correct unsupported assumptions greatly increased the likelihood that misconceptions were flagged, whereas simply urging reliance on scientific evidence had a smaller effect.
Conclusion: LLMs show strong potential as tools to combat neuromyths but are limited by a tendency to produce agreeable responses in applied contexts. Users who explicitly ask these models to identify and correct false assumptions can substantially reduce that limitation, making AI a more reliable aid for evidence-based education.