Robots Detect Objects by Listening to Vibrations

Summary: New research from Duke University introduces SonicSense, a tactile-acoustic sensing system that lets robots interpret objects through vibrations. By placing contact microphones in robotic fingertips, robots can tap, grasp, or shake items to capture vibrations and sound, enabling detection of material type, shape, and internal contents.

SonicSense combines contact-based audio sensing with modern AI to extend robotic perception beyond vision. This multimodal approach helps robots identify previously unseen objects and perceive complex physical properties in cluttered or low-visibility environments, representing a major advance in robotic sensing and interaction.

Key Facts:

SonicSense enables robots to “hear” objects through vibrations recorded at the fingertip.
Robots can infer material, 3D shape, and internal contents by tapping, shaking, or grasping objects.
AI-driven analysis of contact audio lets the system identify unknown objects after multiple interactions and quickly recognize familiar items.

Source: Duke University

Imagine sitting in a dark theater and shaking your soda cup to estimate how much ice and liquid remain. Or gently tapping an armrest to tell whether it is solid wood or a hollow plastic replica. These everyday actions use sound and vibration to reveal physical properties without sight. SonicSense brings that same acoustic intuition to robots, enabling them to use touch-based audio to extract richer information about the objects they handle.

This shows a robotic hand. — SonicSense features a robotic hand with four fingers, each equipped with a contact microphone embedded in the fingertip. Credit: Neuroscience News

Described in a paper accepted to the Conference on Robot Learning (CoRL 2024), SonicSense was developed by researchers in the laboratory of Boyuan Chen at Duke University. The system equips a four-fingered robotic hand with contact microphones embedded in each fingertip. Because these microphones are in direct contact with the object, they pick up vibration signatures while largely rejecting ambient noise, making the acoustic signal highly informative and robust in real-world settings.

During interaction, the robot records vibration signals produced when it taps, shakes, or grasps an object. SonicSense extracts frequency-domain features from those signals and applies machine learning models trained on prior interactions. From these features the system estimates material type, reconstructs three-dimensional shape cues, and even gauges internal contents—such as counting dice inside a box or estimating the level of liquid in a bottle.

For objects already in its database, SonicSense can identify items in as few as four interactions. For unfamiliar objects, the system typically integrates information across up to twenty interactions to reach a reliable conclusion. This incremental, interaction-driven approach mirrors how humans refine hypotheses about an object through repeated touch and sound cues.

SonicSense advances prior acoustic-touch work in three important ways: it uses a multi-fingered hand rather than a single contact point, employs contact microphones that minimize background noise, and leverages contemporary AI methods to interpret complex acoustic signatures. This combination improves performance on objects with mixed materials, transparent or reflective surfaces, and complicated geometries—cases that often challenge vision-only systems.

The research team prioritized real-world robustness by collecting data in an open lab environment, allowing the robot to interact autonomously with diverse, messy objects rather than relying solely on controlled datasets or human-guided trials. That practical focus narrows the gap between laboratory demonstrations and real-world deployment in robotics applications such as inspection, household assistance, and warehouse automation.

Beyond perception, SonicSense is also cost-effective: it builds on inexpensive, commercially available contact microphones similar to those used by musicians, together with 3D-printed parts and off-the-shelf components. The hardware cost for the fingertip sensing array is reported to be slightly over $200, making the approach accessible for research and development teams.

Looking ahead, the team plans to extend SonicSense to handle interactions with multiple objects simultaneously and to integrate object-tracking algorithms so robots can operate reliably in dynamic, cluttered environments. They also see potential to combine contact acoustic sensing with other modalities—such as pressure, temperature, and high-resolution touch—to enable dexterous manipulation and nuanced tactile feedback in advanced robotic hands.

“While vision remains a critical component of robotic perception, sound and vibration bring complementary information that can reveal hidden or ambiguous properties,” said Boyuan Chen. “SonicSense demonstrates that contact audio can be a practical, affordable, and powerful sensing modality for robots working in unstructured, everyday settings.”

Funding: This research was supported by the Army Research Laboratory STRONG program (W911NF2320182, W911NF2220113), DARPA’s FoundSci program (HR00112490372), and DARPA TIAMAT (HR00112490419).

Note: The image above is a representation of a robotic hand and not a direct photograph of the SonicSense system.

About this robotics research news

Author: Ken Kingery
Source: Duke University
Contact: Ken Kingery, Duke University
Image: Image credited to Neuroscience News

Original Research: Findings to be presented at the Conference on Robot Learning (CoRL 2024)