AI Video-Trained Robots Bring Autonomous Surgery Closer

Summary: For the first time, researchers have trained a surgical robot to carry out procedures by watching hours of footage of expert surgeons. This advance in imitation learning allows robots to learn complex surgical tasks from video demonstrations instead of relying on hand-coded instructions for every movement, moving robotic surgery closer to practical autonomy.

Using archived footage captured from wrist-mounted cameras on da Vinci surgical systems, the team taught a model to predict the precise robotic motions needed for common surgical tasks. The trained robot performed needle manipulation, tissue lifting, and suturing with skill comparable to human surgeons, and in some cases corrected its own mistakes without direct programming. Researchers say this approach could dramatically speed how quickly surgical robots are trained and expand the range of procedures they can perform safely and reliably.

The system applies the same foundational machine learning concepts behind large language models but translates them into the “language” of robotic kinematics — mathematical descriptions of angles, positions, and motion. By treating sequences of robotic actions like sequences of words, the model learns to predict the next action from visual input, enabling flexible, generalizable control.

Key Facts:

Robots trained through video imitation performed surgical tasks as effectively as human surgeons in controlled tests.
Imitation learning significantly accelerates robot training, allowing models to generalize from relatively few demonstrations.
The model adapts large-scale machine learning architectures to robotic kinematics rather than text, enabling motion prediction from image input.

Source: Johns Hopkins Medicine

Robots learned surgery by watching expert surgeons’ videos and then replicated those procedures with human-level skill.

This shows a robotic arm. — The researchers fed their model hundreds of videos recorded from wrist cameras placed on the arms of da Vinci robots during surgical procedures. Credit: Neuroscience News

“It’s really remarkable: we just provide camera images and the model predicts the required robotic movements,” said senior author Axel Krieger. The research team highlights this result as a major step toward a new era in medical robotics, where data-driven learning replaces labor-intensive hand-coding for each surgical gesture.

The work, led by Johns Hopkins University with contributions from Stanford University researchers, was featured at the Conference on Robot Learning in Munich, one of the premier gatherings for robotics and machine learning. The group focused on three foundational tasks: needle manipulation, tissue lifting, and suturing — core skills that underpin a wide range of surgical procedures.

To train the model, researchers used hundreds of videos captured by wrist cameras on da Vinci systems. These recordings, routinely made during operations for review and training, provide a vast and diverse dataset. With nearly 7,000 da Vinci robots in use worldwide and more than 50,000 surgeons trained on the platform, a large archive of surgical footage exists for training imitation-based models.

The da Vinci system, while widely adopted, provides imperfect kinematic signals. The researchers addressed this by training the model to predict relative movements rather than relying on absolute positions, which mitigated input inaccuracies and improved robustness. “All we need is image input and then this AI system finds the right action,” said lead author Ji Woong “Brian” Kim. The team found that even a few hundred demonstrations are often enough for the model to learn a procedure and generalize to new surgical environments.

In tests, the trained model matched surgeons’ performance on the three tasks. The researchers observed emergent behaviors: when the robot dropped a needle, it autonomously retrieved and resumed the task — a capability the team had not explicitly programmed. Such adaptability is a hallmark of imitation learning and a promising sign for scaling to more complex procedures.

Previously, automating surgical actions required painstaking hand-coding of each step. Developing a reliable model for even a single suturing variant could take years of manual modeling. By contrast, the imitation learning approach can compress training time to days once appropriate demonstration data are collected, accelerating progress toward safe, autonomous surgical systems while aiming to reduce errors and improve precision.

The research team includes Johns Hopkins authors PhD student Samuel Schmidgall; Associate Research Engineer Anton Deguet; and Associate Professor Marin Kobilarov. Stanford contributors include PhD student Tony Z. Zhao. The authors are continuing work to extend imitation learning from isolated tasks to complete surgical procedures.

About this robotics and AI research news

Author: Jill Rosen
Source: Johns Hopkins University (JHU)
Contact: Jill Rosen – Johns Hopkins University
Image: Image credited to Neuroscience News

Original Research: Findings presented at the Conference on Robot Learning