How AI Robots Use Touch and Vision to Handle Objects Like Humans

Summary: Researchers have developed a new method that allows robots to combine sight and touch to handle objects with greater accuracy and adaptability. The system, called TactileAloha, fuses visual, tactile, and proprioceptive data to enable bimanual robotic manipulation that better reflects how humans use multiple senses when interacting with everyday items.

Where vision-only systems often fail—for example, when a task requires sensing texture, adhesiveness, or which side of an object is facing outward—TactileAloha uses tactile feedback alongside cameras so robots can make more informed, responsive decisions. This multimodal approach improved success rates on difficult tasks such as fastening Velcro and inserting zip ties, and represents a step toward robots that can assist reliably with routine household and workplace tasks.

Key Facts:

Tactile Integration: Merges tactile sensors with visual and proprioceptive inputs to capture texture and contact details that cameras alone miss.
Adaptive Performance: Demonstrates better handling of texture-dependent tasks like Velcro fastening and zip tie insertion compared to vision-only methods.
Everyday Potential: Brings robotics closer to practical assistance in settings such as homes, kitchens, and care environments by improving robustness in real-world manipulation.

Source: Tohoku University

Human grasping feels effortless: we see where an object is and feel it as we touch it, instantly combining those senses to adjust our grip. Recreating that capability in robots remains difficult because tactile cues—texture, stickiness, or front-versus-back orientation—are often invisible to cameras alone.

An international research team addressed this gap by extending an existing dual-arm teleoperation platform with tactile sensing and a transformer-based policy. Built on an open-source ALOHA framework, their system captures fine-grained contact information with a gripper-mounted tactile sensor and fuses it with visual and proprioceptive data. This integrated approach enables robots to perform bimanual operations that require nuance and dexterity.

This shows a sleek robotic hand and a cup of coffee. — By applying vision-tactile transformer technology, the Physical AI robot achieved more flexible, adaptive control. Credit: Neuroscience News

The full details were published in IEEE Robotics and Automation Letters on July 2, 2025.

Machine learning enables robots to learn human-like movement and decision patterns from demonstrations. ALOHA (A Low-cost Open-source Hardware System for Bimanual Teleoperation), developed at Stanford, provides a low-cost, open-source basis for teleoperation and data collection. Using that platform, the research team added tactile hardware and an end-to-end learning pipeline to capture and exploit touch information during teleoperated demonstrations.

Vision-only controllers often struggle with tasks where tactile texture or adhesiveness matters. For instance, distinguishing the hook and loop sides of Velcro or sensing whether a zip tie is properly aligned can be trivial by touch but ambiguous by sight. The research team therefore focused on integrating tactile sensing to enable operational decisions based on contact and texture, not just visual appearance.

“To overcome these limitations, we developed a system that also enables operational decisions based on the texture of target objects—information that is hard to infer from vision alone,” says Mitsuhiro Hayashibe, professor at Tohoku University’s Graduate School of Engineering. “This work is an important step toward multimodal physical AI that integrates vision, touch, and other senses.”

Named TactileAloha, the system uses a pre-trained ResNet to encode tactile signals and fuses these encodings with visual and proprioceptive features. A transformer-based policy with action chunking predicts sequences of future actions, while a weighted loss emphasizes near-term actions during training. An improved temporal aggregation scheme at deployment further sharpens action timing and precision.

In experiments the team introduced two bimanual tasks—zip tie insertion and Velcro fastening—both of which demand tactile perception to align and manipulate objects correctly. The robot adapted its manipulation sequences in response to tactile feedback, enabling it to complete texture-related tasks that camera-only methods could not. Quantitatively, their method delivered roughly an 11.0% relative improvement over a state-of-the-art tactile baseline.

By combining multiple sensory inputs into a single control policy, TactileAloha enables more flexible, responsive motions and improves success on tasks that need touch-sensitive judgment. These capabilities open the door to many practical uses, from routine domestic chores to delicate handling in industrial and caregiving contexts.

The research team includes members from Tohoku University’s Graduate School of Engineering, the Centre for Transformative Garment Production at Hong Kong Science Park, and the University of Hong Kong.

About this AI and robotics research news

Author: Public Relations
Source: Tohoku University
Contact: Public Relations, Tohoku University
Image: Image credit: Neuroscience News

Original Research: Open access. “TactileAloha: Learning Bimanual Manipulation with Tactile Sensing” by Mitsuhiro Hayashibe et al., IEEE Robotics and Automation Letters.

Abstract

TactileAloha: Learning Bimanual Manipulation with Tactile Sensing

Tactile texture is critical for many manipulation tasks but is difficult to observe reliably with cameras alone. To address this, we present TactileAloha: an integrated tactile-vision robotic system based on the ALOHA teleoperation platform. A gripper-mounted tactile sensor captures fine-grained contact information and supports real-time visualization during teleoperation to facilitate efficient data collection.

We encode tactile signals using a pre-trained ResNet and fuse them with visual and proprioceptive features. A transformer-based policy with action chunking processes the combined observations to predict future actions. Training uses a weighted loss that emphasizes near-future actions, and deployment employs an improved temporal aggregation scheme to increase action accuracy.

We evaluate on two bimanual tasks—zip tie insertion and Velcro fastening—both of which require perceiving texture and aligning object orientations using two hands. Our method systematically adapts generated manipulation sequences in response to tactile sensing. Results demonstrate that incorporating tactile information enables successful handling of texture-related tasks that vision-only methods cannot address and yields an average relative performance improvement of approximately 11.0% over a state-of-the-art tactile baseline.