What an AI Birdsong Decoder Reveals About the Human Brain

Summary: Canaries are accomplished vocal learners, capable of combining 30 to 40 distinct syllables into long, structured songs. Researchers have built TweetyBERT, a self-supervised AI model that automatically parses these songs with accuracy comparable to expert human annotators.

TweetyBERT adapts the transformer architecture behind early BERT-based language models to detect notes, syllables, and phrases in birdsong without requiring human-labeled training data. By automating the time-consuming task of annotating vocal sequences, this tool gives neuroscientists a fast, scalable way to study how the brain learns and produces complex vocal behavior. The same approach also holds promise for broader applications, from marine mammal vocalizations to environmental monitoring of wild bird populations.

Key Facts

Self-supervised learning: TweetyBERT trains itself by predicting masked fragments of audio, so it does not depend on large, hand-labeled datasets.
Canaries as a model: Songbirds like canaries are important models for vocal learning because their lifelong capacity to acquire and refine songs parallels aspects of human speech learning.
Expert-level performance: The model’s annotations match the precision of skilled human annotators while processing data far more quickly, enabling large-scale and longitudinal studies.
BERT-based transformer: The system repurposes the Bidirectional Encoder Representations from Transformers framework to learn acoustic patterns and temporal structure in birdsong.
Wide potential: Though developed for canaries, the underlying method can be adapted to other species, and initial tests are expanding into dolphin and whale vocalizations.

Source: University of Oregon

TweetyBERT: automated, scalable annotation of canary vocalizations

A research team at the University of Oregon developed TweetyBERT, a self-supervised transformer that segments and classifies canary vocalizations with performance comparable to domain experts. By learning the structure of song directly from audio data, the model identifies behavioral units—notes, syllables and phrases—without exposure to human labels. This capability both accelerates analysis and opens new avenues for studying the neural basis of learned vocal behavior.

This shows a canary singing and sound waves. — TweetyBERT uses self-supervised machine learning to autonomously identify and categorize the complex vocal units of songbirds, providing a new tool for studying the neural foundations of language. Credit: Neuroscience News

“Most current techniques for analyzing animal vocalizations need large sets of human-labeled examples, which are slow and expensive to produce,” says Tim Gardner, associate professor of bioengineering at the University of Oregon’s Knight Campus. “TweetyBERT instead learns from unlabeled recordings by predicting masked audio fragments, allowing it to discover the building blocks of song on its own and to annotate sequences rapidly.”

Canaries are widely used in neuroscience because they continue to learn and modify songs over their lifetime, offering an accessible model of complex learned behaviors. Graduate student George Vengrovski developed TweetyBERT to automatically annotate canary song, which typically consists of dozens of distinct syllables arranged into longer sequences. The resulting annotations make it practical to map which neural circuits are active for specific vocal elements and to investigate how those circuits change with learning or experience.

TweetyBERT adapts the bidirectional transformer architecture—originally developed for human language—to handle acoustic signals. The network is trained to reconstruct masked portions of audio streams, forcing it to encode meaningful acoustic and temporal relationships. Applied to recordings of canary song, the model groups similar sounds and detects boundaries between notes and syllables, producing labels and timestamps comparable to those produced by expert human analysts.

Beyond laboratory neuroscience, the approach is well suited to field applications. With modifications for different acoustic environments and species, self-supervised models like TweetyBERT can analyze large volumes of passive recordings from wild bird populations. That capability could reveal shifts in vocal behavior linked to habitat disturbance, noise pollution, or changing climate conditions—providing an automated early-warning system for ecosystem health.

“Although we designed TweetyBERT for canaries, the core idea is not species-specific,” Gardner notes. “Millions of hours of animal vocalizations are already recorded, but most go unanalyzed because manual annotation is impractical. Self-supervised tools let researchers scale analysis across individuals, populations, and species.”

Frequently asked questions

Q: Can TweetyBERT “talk” to birds?

A: No. TweetyBERT does not translate birdsong into human language. Instead, it parses the structure of song—identifying recurring elements and their sequence—so researchers can map vocal units to neural activity and behavior.

Q: Why use canaries to study human speech?

A: Canaries are vocal learners that use dedicated brain circuits for listening, imitating, and producing sounds. Studying how they learn and organize dozens of syllables helps illuminate general principles of vocal learning that are relevant to human speech.

Q: How can this work support environmental monitoring?

A: TweetyBERT’s scalability makes it suitable for processing thousands of hours of field recordings. Automated detection of subtle changes in song structure or timing can indicate how populations respond to noise, habitat alteration, or climate change, helping conservationists and ecologists monitor ecosystem health more efficiently.

Editorial Notes:

This article was edited by a Neuroscience News editor.
The journal paper was reviewed in full by editorial staff.
Additional explanatory context was added to clarify methods and potential applications.

About this research

Author: Molly Blancett
Source: University of Oregon
Contact: Molly Blancett, University of Oregon
Image: Image credited to Neuroscience News

Original research: TweetyBERT: Automated parsing of birdsong through self-supervised machine learning. Authors: George Vengrovski, Miranda R. Hulsey-Vincent, Melissa A. Bemrose, and Timothy J. Gardner. DOI: 10.1016/j.patter.2025.101491

Abstract

TweetyBERT: Automated parsing of birdsong through self-supervised machine learning

Deep neural networks can be trained to parse animal vocalizations, identifying the basic units of communication and annotating sequences for downstream statistical and neural analyses. Traditional approaches rely heavily on human-labeled training data, making large-scale analysis costly and slow. The problem of fully unsupervised parsing of animal vocalizations remains challenging.

To address this, the authors introduce TweetyBERT, a self-supervised transformer neural network tailored to birdsong analysis. The model learns to predict masked or hidden fragments of audio without supervision, enabling it to discover meaningful acoustic and temporal patterns. Applied to canary recordings, TweetyBERT autonomously identifies behavioral units such as notes, syllables, and phrases, closely matching expert annotations. Self-supervised models designed for animal communication promise to accelerate analysis of unlabeled vocal datasets and broaden opportunities for comparative and field studies.