Why Do AI Chatbots Still Sound Fake?

Summary: A recent study comparing human and AI-generated conversations shows that large language models such as ChatGPT and Claude still struggle to reproduce the subtle features of natural spoken dialogue. Although these models can produce grammatically correct and logically coherent responses, they tend to over-imitate their conversation partners, misuse small conversational markers like “well” and “like,” and mishandle natural openings and closings. These patterns—what researchers call “exaggerated alignment”—make AI conversations sound artificial despite technical fluency. The research suggests that some social subtleties in human interaction remain challenging for current conversational AI.

Researchers emphasize that while conversational AI is improving quickly, the mismatch in social rhythm, timing, and use of discourse markers distinguishes machine-generated dialogue from everyday human speech.

Key Facts:

  • Exaggerated imitation: AI models match their partners’ language patterns too closely and too quickly, producing behavior humans detect as unnatural.
  • Filler word misuse: Models often misplace or overuse discourse markers such as “so,” “well,” “like,” and “anyway,” which disrupts conversational flow.
  • Poor transitions: AI frequently misses the subtle openings and closings that frame human conversations, making exchanges feel abrupt or scripted.

Source: NTNU

Artificial intelligence can be impressive, and many people use large language models like ChatGPT, Copilot, and Perplexity to assist with tasks or for entertainment. Yet when it comes to simulating natural human conversation, recent research shows important limitations.

“Large language models speak differently than people do,” said Associate Professor Lucas Bietti from the Norwegian University of Science and Technology’s (NTNU) Department of Psychology. Bietti is one of the authors of a paper recently published in Cognitive Science. The lead author is Eric Mayor from the University of Basel, and Adrian Bangerter from the University of Neuchâtel is the final author.

Tested several models

The team compared several prominent language models—GPT-4 (ChatGPT-4), Claude Sonnet 3.5, and two open-source models, Vicuna and Wayfarer—against real human telephone conversations. Their evaluation proceeded in two stages:

  • First, they generated model transcripts designed to mimic instructions given to participants in the Switchboard (SB) telephone corpus and compared these transcripts to authentic human conversations.
  • Second, they tested whether human raters could reliably distinguish between human and model-generated transcripts.

Overall, human judges were generally able to tell model transcripts from real conversations. The researchers then analyzed which features revealed the artificial nature of AI conversations.

Too much imitation

People naturally adapt their speech to each other in subtle ways. This alignment is usually restrained and woven into the flow of interaction. The study found that language models were overly eager to align with their partners’ wording and phrasing. This stronger, faster imitation—labelled “exaggerated alignment”—stands out to human listeners and readers as non-human.

Incorrect use of filler words

Natural dialogue relies heavily on small discourse markers—words like “so,” “well,” “like,” and “anyway.” These markers do more than fill silence: they signal attitude, interest, turn-taking, and the structure of the exchange. The study found that LLMs often use these markers differently from humans—either overusing them, placing them awkwardly, or omitting them when they would normally appear. Such misuse undermines the social and pragmatic cues that make conversation feel natural.

“The large language models use these small words differently, and often incorrectly,” Bietti said. That difference helps listeners detect that a conversation was generated rather than lived.

Opening and closing features

Human conversations typically begin with a brief social preface—greetings or small talk—before shifting to the main topic. That transition usually happens fluidly and implicitly. Similarly, people rarely end a conversation immediately after exchanging the necessary information; they close with brief phatic phrases like “alright, then,” “talk to you later,” or “see you soon.” The analysis showed that current models struggle with these natural openings and closings, tending to start and stop conversations in ways that feel abrupt or formulaic.

Better in the future? Probably

Taken together, these features led the authors to conclude that modern large language models are not yet able to simulate spoken human conversation convincingly on a consistent basis. “Today’s large language models are not yet able to imitate humans well enough to consistently fool us,” Bietti said.

The field is evolving rapidly, and future improvements—more training on spoken dialogue, better modeling of timing and turn-taking, and refinements in pragmatic behavior—may reduce these gaps. Still, the researchers note that some core differences in social intent, empathy, and real-time coordination might remain difficult for AI to fully replicate.

For now, conversational AI can perform many useful functions and produce fluent text, but it still falls short of reliably reproducing the social subtleties of everyday spoken interaction.

Key Questions Answered:

Q: Why do AI conversations sound unnatural?

A: AI tends to over-imitate its partner and lacks subtle conversational cues—timing, phrasing, and social rhythm—that give human speech its natural flow.

Q: What specific mistakes give AI away?

A: Incorrect use of filler words, awkward transitions between topics, and overly formal or mechanical phrasing are common giveaways.

Q: Will AI ever sound fully human?

A: Possibly. Improvements are likely to narrow the gap, but researchers caution that some elements—like genuine empathy and real-time social intent—may remain distinguishing features of human interaction.

About this AI research news

Author: Nancy Bazilchuk
Source: NTNU
Contact: Nancy Bazilchuk – NTNU
Image: The image is credited to Neuroscience News

Original Research: Open access.
“Can Large Language Models Simulate Spoken Human Conversations?” by Lucas Bietti et al., Cognitive Science


Abstract

Can Large Language Models Simulate Spoken Human Conversations?

Large language models (LLMs) can emulate many aspects of human cognition and are often described as a potential paradigm shift. While they perform well in chat-style exchanges, their ability to replicate spoken conversation is less understood. This research examined whether LLMs can simulate spoken human conversation by comparing transcripts from the Switchboard telephone corpus with transcripts generated by GPT-4, Claude Sonnet 3.5, Vicuna, and Wayfarer under prompts intended to mirror Switchboard instructions.

The analysis focused on alignment (conceptual, syntactic, lexical), use of coordination markers (discourse markers and filler words), and coordination of openings and closings. Study 1 documented both quantitative and qualitative differences, including exaggerated alignment and atypical use of coordination markers. Study 2 tested whether humans could distinguish LLM transcripts from human transcripts; LLM conversations did not consistently pass for human conversations. The findings indicate that spoken conversations generated by LLMs remain both qualitatively and quantitatively distinct from human speech. While future models and targeted training on spoken interaction could reduce these gaps, fundamental differences between chat-based generation and real-time spoken interaction may continue to present challenges.