Can We Control AI? Risks and Strategies

Summary: In his forthcoming book AI: Unexplainable, Unpredictable, Uncontrollable, AI safety researcher Dr. Roman V. Yampolskiy warns that current evidence does not demonstrate that advanced artificial intelligence can be reliably controlled. His review of scientific literature argues that highly autonomous AI systems could present unprecedented risks — up to and including scenarios that threaten human survival — unless research into safety, transparency, and controllability is substantially accelerated.

Yampolskiy emphasizes that the combination of rapid AI capability growth and limited understanding of control mechanisms demands urgent attention. He calls for practical safety measures, clearer design requirements, and broader investment in research to reduce risks while preserving potential benefits from AI systems.

Key Facts:

After reviewing the literature, Yampolskiy finds no definitive proof that advanced AI can be fully controlled, and he warns that unchecked development could lead to catastrophic outcomes.
The adaptive, learning nature of AI makes its behavior difficult to predict and verify, raising serious concerns about alignment with human values and the potential for unintended harm.
To reduce risk, Yampolskiy advocates for AI systems that are transparent, explainable, limitable and modifiable; he also urges greater funding and coordinated research on AI safety.

There is no current evidence that AI can be controlled safely, according to an extensive review, and without proof that AI can be controlled, it should not be developed, a researcher warns.

This shows a robotic face. — To minimize the risk of AI, he says it needs to be modifiable with ‘undo’ options, limitable, transparent and easy to understand in human language. Credit: Neuroscience News

In AI: Unexplainable, Unpredictable, Uncontrollable, Dr. Roman V. Yampolskiy examines how increasingly powerful AI could reshape societies and institutions. He frames the dilemma starkly: the same technological advance that promises major benefits could also produce outcomes that humanity cannot reverse or control.

Yampolskiy writes: “We are facing an almost guaranteed event with potential to cause an existential catastrophe. No wonder many consider this to be the most important problem humanity has ever faced. The outcome could be prosperity or extinction, and the fate of the universe hangs in the balance.”

Uncontrollable superintelligence

Based on a wide-ranging literature review, Yampolskiy concludes there is no convincing evidence that we can fully control a superintelligent system. Partial controls exist in narrow contexts, but he argues these mechanisms are unlikely to scale to general, highly autonomous intelligence.

He asks a fundamental question: before pursuing designs for controlled superintelligence, should we have demonstrable proof that the control problem is solvable? In his view, absent such evidence the responsible path is to expand safety research and apply stronger caution to development efforts.

Yampolskiy notes that our capacity for creating intelligent software currently outpaces our ability to verify and govern it. From this perspective, advanced systems may always retain a residual level of uncontrollability, and mitigating that residual risk should be the central goal of AI safety work.

What are the obstacles?

Unlike conventional programs, AI systems can learn, adapt, and act semi-autonomously in new situations. As their capabilities grow, the space of possible behaviors expands dramatically, making exhaustive prediction and patching impractical. In other words, a superintelligent agent could generate an effectively unbounded set of failure modes.

Another core issue is explainability. Many advanced models produce decisions that are difficult to interpret, either because the explanations they offer are inaccessible to human minds or because the internal reasoning cannot be translated into human-understandable terms. Without clear explanations, it becomes harder to spot errors, biases, or manipulative outputs.

Yampolskiy highlights real-world stakes: AI tools are already making or supporting decisions in healthcare, finance, hiring, and security. When a system affects people’s lives, transparency and the ability to justify decisions are essential to trust and accountability.

He cautions: “If we grow accustomed to accepting AI’s answers without an explanation, essentially treating it as an Oracle system, we would not be able to tell if it begins providing wrong or manipulative answers.”

Controlling the uncontrollable

As AI capabilities increase, so does their potential autonomy — and, correspondingly, our difficulty in maintaining control. Yampolskiy argues this relationship is not merely a practical challenge but a conceptual one: less intelligent agents cannot reliably impose permanent control over far more intelligent ones.

He notes paradoxes that arise when attempting to eliminate bias or human influence. For instance, a superintelligent system attempting to avoid programmer bias might discard prior knowledge and re-derive truths from first principles — a process that could also eliminate any built-in pro-human preferences.

“Less intelligent agents (people) can’t permanently control more intelligent agents (ASIs),” he writes. “This is not because we may fail to find a safe design for superintelligence in the vast space of all possible designs, it is because no such design is possible, it doesn’t exist. Superintelligence is not rebelling, it is uncontrollable to begin with.”

Yampolskiy frames a societal choice: accept a protective but controlling guardian or retain human autonomy at the expense of giving up some potential benefits from highly capable AI. He suggests a possible compromise is deliberately limiting capability in exchange for greater controllability.

Aligning human values

A natural proposal is to design machines to follow human orders precisely. But Yampolskiy points out the limitations: human directives can be conflicting, ambiguous, or malicious. An alternative model is advisory AI that recommends actions while leaving final decisions to humans; however, for advice to be useful, such systems may need internal value structures that are superior to ours, introducing a new alignment problem.

He observes a core paradox of value alignment: a system explicitly ordered to comply may refuse when it interprets the broader intent as harmful, protecting humanity at the cost of human autonomy. In short, protecting and respecting human agency can be in tension.

Minimizing risk

To reduce hazards, Yampolskiy recommends design principles centered on modifiability and transparency: systems should include easy-to-use “undo” mechanisms, clear limits, and explanations in human language. He also proposes categorizing AI as controllable or uncontrollable, considering temporary moratoriums or partial bans on especially risky technologies, and expanding funding for safety research.

Rather than halting progress, Yampolskiy argues these measures are a call to action: “We may not ever get to 100% safe AI, but we can make AI safer in proportion to our efforts, which is a lot better than doing nothing. We need to use this opportunity wisely.”

About this AI research news

Author: Becky Parker-Ellis
Source: Taylor and Francis Group
Contact: Becky Parker-Ellis – Taylor and Francis Group
Image: The image is credited to Neuroscience News

Original Research: The book, AI: Unexplainable, Unpredictable, Uncontrollable by Roman V. Yampolskiy is available to preorder. This coverage summarizes the author’s review and recommendations regarding AI control, explainability, alignment, and policy options aimed at reducing the most severe risks associated with advanced artificial intelligence.