Cheat on a Written Exam? AI Can Detect You 90% of the Time

Summary: Ghostwriter is a deep learning system that uses a Siamese neural network to compare writing styles and identify whether a student likely wrote a submitted assignment or had it written by someone else. The algorithm analyzes a student’s current text against past submissions and returns a percentage score indicating similarity in writing style, offering a tool to support authorship verification and detect potential ghostwriting.

Source: University of Copenhagen

Combining big data and artificial intelligence, researchers at the University of Copenhagen report that their Ghostwriter system can distinguish between student-written and ghostwritten assignments with nearly 90% accuracy.

Cheating on written assignments is common and appears to be increasing among high school students. At the Department of Computer Science at the University of Copenhagen, researchers have been developing methods that apply artificial intelligence to writing analysis in order to detect such misconduct. Using a dataset of 130,000 Danish high school assignments, the team has trained a model that, according to their results, can identify when a student’s work diverges from their established writing style with close to 90 percent accuracy.

High schools in Denmark currently use the Lectio platform to detect direct plagiarism—passages copied verbatim from previously submitted work. However, Lectio and similar tools cannot reliably detect when a student has had someone else write an entire assignment. This problem is particularly acute for major projects such as the SRP, a significant written assignment in the final year of Danish high school, where some students have sought outside help through online services and classified ads.

“Traditional plagiarism checkers look for verbatim matches, but they won’t flag a piece if it was written by a hired ghostwriter,” explains PhD student Stephan Lorenzen from the Department of Computer Science. “Ghostwriter identifies discrepancies in writing style by comparing a new submission to that student’s previous work. The system examines features such as average word length, sentence structure, and lexical choices—for example, whether a writer tends to use ‘for example’ in full or prefers abbreviations like ‘e.g.’” Lorenzen and the DIKU-DABAI research group recently presented these findings at a major European AI conference.

Ethical considerations before deployment

Ghostwriter is based on machine learning and neural networks, techniques well suited to recognizing complex patterns in text. MaCom, the company behind the Lectio platform, provided the research team with an anonymized dataset of 130,000 written assignments from roughly 10,000 students. The project remains a research initiative for now.

Lorenzen believes that adoption of authorship-verification tools in schools is plausible as educational institutions respond to evolving technology-driven cheating methods. At the same time, he stresses the need for careful ethical deliberation. “I expect schools may adopt systems like this in time, but any automated result should never be used in isolation. The output should support and substantiate a suspicion and be followed up by human review and context-aware decision making,” he says.

Applications beyond schools

The same underlying technology can be applied outside education. Forensic document examiners and law enforcement could use authorship analysis to assist in cases involving forged documents, while journalists and researchers might apply similar methods to help assess disputed authorship. The team notes that such uses also require stringent ethical oversight to prevent misuse.

This shows two heads and network lines
Ghostwriter is built on machine learning and neural networks—methods well suited to recognizing stylistic patterns in text. The image is in the public domain.

Lorenzen adds that the research group has already experimented with other datasets, including social media posts, to distinguish human users from bots or paid imposters. This versatility demonstrates the broader potential of authorship verification, while underscoring the importance of transparent, ethical use.

FACTS:

  • Ghostwriter uses a Siamese neural network to compare the writing styles of two texts. The network learns representations of writing style from large datasets and then measures similarity between examples.
  • When a student submits an assignment, the system compares the new text against that student’s earlier submissions. It produces a similarity percentage for each comparison.
  • A weighted average of those similarity scores is computed, incorporating additional contextual factors such as submission timing.
  • The final output is a percentage score that reflects how closely the new assignment matches the student’s established writing style.
  • The research group behind Ghostwriter is DIKU-DABAI (Danish Center for Big Data Analytics driven Innovation), led by Professor Stephen Alstrup.

Funding: The research is supported by Innovation Fund Denmark.

About this research

Source:
University of Copenhagen
Media contact:
Stephan Lorenzen – University of Copenhagen
Image source:
The image is in the public domain.

Original research: A PDF copy of the full research report is available from the University of Copenhagen’s publications.

Share this article