Summary: Researchers at the University of Cambridge are developing an artificial intelligence system designed to “quarantine” hate speech in a way similar to how antivirus tools isolate malicious software. With current models reaching roughly 80% accuracy in detecting harmful language, the approach aims to reduce exposure to abusive content while preserving user choice and minimizing outright censorship.
Source: University of Cambridge
Cambridge researchers propose quarantining online hate speech using techniques inspired by cybersecurity.
Hate speech definitions differ across countries, legal systems and platforms, and simply blocking keywords is often ineffective. Violent threats or targeted abuse can be expressed without obvious slurs, and context matters: the same word can be abusive in one sentence and harmless in another.
That contextual complexity makes automated detection difficult. At present, harmful content is typically removed only after victims report it, which means psychological damage can already have occurred. Human moderators then must evaluate each complaint, a costly and slow process.
This challenge reflects an ongoing tension between free expression and the need to limit abusive language online.
To address this, a linguist and an engineer from Cambridge published a proposal in the journal Ethics and Information Technology that borrows quarantine concepts from cyber security to give potential targets more control without imposing blanket censorship.
The team is training machine learning algorithms on databases of threats and violent insults so the system can assign a probability score indicating how likely a message is to contain hate speech.
When the model signals probable hate speech, the message would be “quarantined.” Recipients would receive a warning containing a “Hate O’Meter” — a severity score — the sender’s identifier, and options to view the content or delete it unseen.
Acting much like spam or malware filters, the proposed system aims to dramatically reduce how often people are exposed to hateful material. The researchers, working under the ‘Giving Voice to Digital Democracies’ project, targeted a prototype for early 2020 and expect continued refinement as models improve.
“Hate speech is a form of intentional online harm, like malware, and can therefore be handled by means of quarantining,” said co-author and linguist Dr. Stefanie Ullman. “A lot of hate content is even automated, produced by bots.”
“Major platforms generally react after abuse is reported,” said co-author and engineer Dr. Marcus Tomalin. “That may be tolerable for occasional exposure, but for people who face repeated attacks it’s too little, too late.”
Tomalin notes that many women and members of minority groups face sustained anonymous abuse simply for maintaining a public online presence. That harassment can deter under-represented groups from participating in public life, reducing diversity where representation is already needed.
Prominent figures have warned about the effects of pervasive online abuse. In public statements, some leaders have linked unchecked hate speech to broader risks for democratic discourse, while platform executives have argued about where to draw the line between protection and free expression.
The Cambridge team positions quarantining as a middle ground between permitting all speech and heavy-handed censorship. Crucially, the system returns decision-making power to the individual recipient rather than placing it solely with corporations or governments.
“Our system will flag when you should be careful, but it’s always your call. It doesn’t stop people posting or viewing what they like, but it gives much-needed control to those being inundated with hate.”
Earlier detection algorithms achieved around 60% accuracy — only marginally better than chance. Tomalin’s lab reports improving that figure to roughly 80% and expects further gains as mathematical models and training data advance.
Ullman continues to expand the training dataset with verified examples of hateful content so the algorithms can refine their confidence scores. Those scores would determine whether a message is quarantined and how it appears on the Hate O’Meter; users could adjust sensitivity settings to match their personal tolerance.
Consider a word like “bitch,” which can be a misogynistic slur or a neutral term in other contexts (for example, dog breeding). The system analyzes syntactic position, surrounding words and semantic relations to judge whether the usage is abusive. Beyond single keywords, the classifiers examine whole sentence structures and additional sociolinguistic cues such as user profiles and posting histories to improve reliability.
“Automated quarantines that include guidance on the strength of hateful content can empower recipients and reduce the poisoning of our online conversations,” Tomalin added.
The team, based at Cambridge’s Centre for Research into Arts, Humanities and Social Sciences (CRASSH), acknowledges the project will fuel an ongoing arms race: as defenses improve, those intent on spreading hate will adapt their tactics, much like cybercriminals do with malware.
Researchers are also examining “counter-speech” — how people respond to hate — and plan to inform debates about designing virtual assistants and automated agents to respond appropriately to threats and intimidation.
Funding: The project received support from the International Foundation for the Humanities and Social Change.
Source:
University of Cambridge
Media Contacts:
Fred Lewsey – University of Cambridge
Image Source:
The image is in the public domain.
Original Research: Open access
“Quarantining online hate speech: technical and ethical perspectives” — Stefanie Ullmann & Marcus Tomalin. Ethics and Information Technology. doi: 10.1007/s10676-019-09516-z.
Abstract
Quarantining online hate speech: technical and ethical perspectives
This paper explores quarantining as an ethical strategy to limit the spread of hate speech across social media. Today, platforms typically remove offensive posts only after users complain and human moderators review them, which means recipients may already have been harmed. This reactive model also raises concerns about freedom of expression because it places censorship decisions in the hands of service providers. Emerging automatic hate speech detectors offer new options. Anticipating improvements in these systems, the authors propose treating suspected hate speech like malicious software: automatically classifying and temporarily quarantining harmful posts so that recipients receive alerts rather than immediate exposure. The quarantining framework aims to strike a more justifiable balance between free expression and protection from harm.