Summary: The NIH Baby Toolbox is an innovative, norm-referenced assessment system created to evaluate infant and toddler development from as early as 16 days up to 42 months. Combining gaze-based paradigms, video presentation, and tablet delivery, the Toolbox measures cognitive, motor, language, and social-emotional skills with high precision and efficiency.
Unlike many traditional early-childhood measures, the NIH Baby Toolbox was designed to be practical for widespread use: it requires minimal administrator training, avoids expensive or specialized equipment, and produces rapid, reliable results. This makes earlier detection of developmental delays and timely connection to interventions more feasible than before.
Key facts
- First nationally validated tool for infants under 3: The Baby Toolbox addresses a major gap by providing a standardized assessment for children younger than three years.
- Modern, child-friendly methods: The assessment relies on gaze-based tasks and video stimuli, removing the need for verbal responses or complex testing apparatus.
- Validated and scalable: Normed on a nationally representative sample of more than 2,500 infants and toddlers, with evidence of reliability across English- and Spanish-speaking households.
Source: Northwestern University
Northwestern University developmental scientists and medical social science experts led the development of this newest NIH Toolbox to provide a standardized way to measure cognitive, language, motor, and social-emotional development in infants aged 16 days to 42 months.

Previously, the NIH Toolbox covered ages three years through adulthood, leaving an urgent need for a validated, research-based battery for infants and toddlers. Early life experience and early development strongly influence later outcomes; unrecognized delays in the first months and years can cascade into longer-term difficulties. A practical, accurate assessment for very young children is therefore essential for effective early intervention.
A special eight-article issue of the journal Infant Behavior and Development details the methodology, validation, and early results for the NIH Baby Toolbox and provides the scientific foundation for this new assessment suite.
Supporting positive developmental outcomes in new ways
The development team faced two principal challenges. First, the assessment had to be comprehensive while avoiding costly licensing, multiple platforms, or complex training. Second, it needed measures that reliably engage infants and toddlers who cannot complete traditional paper-and-pencil tasks or answer direct questions. The solution integrates short, tablet-delivered tasks with video and gaze-tracking paradigms to capture infants’ responses naturally and noninvasively.
Gaze-based methods, long used in laboratory research to track early attention and learning, are now incorporated into a clinically oriented, standardized tool. By capturing where and how infants look in response to stimuli, these tasks reveal emerging cognitive and language skills without requiring spoken responses.
Developmental scientist Sandra Waxman, Louis W. Menk Professor of Psychology at Northwestern, who has studied gaze-based learning for decades, emphasizes that the Toolbox allows clinicians to identify typical development and early deviations in ways that were previously impractical at scale.
“This new, standardized assessment enables earlier identification of developmental risk and, consequently, earlier interventions that can support better outcomes,” Waxman said.
Efficient, precise measurement
Richard Gershon of Northwestern’s Feinberg School of Medicine, who led the original NIH Toolbox, served as principal investigator for the Baby Toolbox project. The new infant-and-toddler measures are designed to complement the existing NIH Toolbox for older children and adults, enabling seamless longitudinal tracking from infancy onward.
The Baby Toolbox offers automated or guided scoring, no per-patient fees, and much shorter administration and training time than comparable instruments. This combination supports broader adoption in research, clinical practice, and community settings such as early education or public health screening.
Development process and validation
To create the assessment suite, the team surveyed more than 400 domain experts and conducted a systematic review of the literature to select candidate measures suitable for tablet-based delivery. Criteria included existing validity evidence, feasibility for children 16 days to 42 months, and ease of administration and scoring within a brief session.
NIH Baby Toolbox scientific director Aaron Kaat, professor of medical social sciences, notes that many established measures required extensive training or costly resources, so the project both adapted existing tests and partnered with leading research teams to develop new, tablet-friendly measures that met the project’s practical requirements.
Before public release, the team completed a comprehensive norming study with more than 2,500 infants and toddlers from English- and Spanish-speaking households. The norming results showed high test-retest reliability and demonstrated that the app reliably assesses developmental milestones across diverse populations.
About this neurodevelopment research news
Author: Kristin Samuelson
Source: Northwestern University
Contact: Kristin Samuelson – Northwestern University
Image: Image credit: Neuroscience News
Original research (open access): “The NIH Baby Toolbox: A new norm-referenced tool for evaluating infant and toddler development” by Sandra Waxman et al., published in Infant Behavior and Development. This special issue presents the methods, adaptations for tablet administration, and validation results supporting the NIH Baby Toolbox.
Abstract
The NIH Baby Toolbox: A new norm-referenced tool for evaluating infant and toddler development
Commissioned under the National Institutes of Health Neuroscience Blueprint, the NIH Infant and Toddler Toolbox (NBT) provides assessments for children from 1 to 42 months from English- and Spanish-speaking homes. The NBT evaluates cognition, motor skills, and social-emotional functioning using tablet-friendly, norm-referenced measures. The special journal collection focuses on how tests were selected or adapted for tablet use, how they were normed, and the evidence supporting their validity and reliability for use in infancy and early toddlerhood. Papers in the collection examine domain-specific measures as well as methodological and technological advances relevant to infant and toddler assessment.