Summary: A new study explores the genetic foundations of early language development and how those genetic factors relate to later cognitive skills and neurodevelopmental disorders such as ADHD and ASD.
Drawing on vocabulary measures from more than 17,000 children who speak English, Danish, or Dutch, researchers examined how genetic variation shapes word production and comprehension from infancy through toddlerhood. The findings show that vocabulary size in the first two years of life reflects genetic influences that also predict later literacy, general cognitive ability, and risks for neurodevelopmental conditions.
Notably, the study reveals a developmental shift in the genetic links with ADHD-related traits: genetic associations that connect larger infant vocabularies to higher ADHD risk reverse by toddlerhood. This dynamic pattern suggests the relationship between early language and later neurodevelopmental outcomes is complex and changes rapidly during early childhood.
Key Facts:
- Genetic basis of early language: The study identifies genetic influences on vocabulary size in infancy and toddlerhood and links those influences to later literacy, cognition, and ADHD-related outcomes.
- Developmental shift in ADHD associations: A larger expressive vocabulary in infancy was genetically associated with increased ADHD risk, while in toddlerhood smaller receptive vocabulary showed associations with more ADHD symptoms, indicating changing genetic relationships over time.
- Implications for intervention: Results underscore the value of early assessment and suggest that understanding genetic predispositions can inform timing and targets for educational and therapeutic support.
Source: Max Planck Institute
Early language development predicts later language, reading, and learning abilities. Difficulties with language development are also linked to neurodevelopmental conditions including Attention-Deficit/Hyperactivity Disorder (ADHD) and Autism Spectrum Disorder (ASD).
Most children begin producing recognizable words between about 10 and 15 months. By roughly two years of age, many children can say between 100 and 600 words and understand many more. Individual trajectories vary widely across children, and part of this variation can be attributed to genetic differences, according to senior researcher Beate St Pourcain, the lead scientist on the study.

Word production and comprehension
To clarify how genetics contributes to children’s early word production and comprehension, the research team performed a genome-wide meta-analysis (GWAS) on vocabulary size at two developmental phases: infancy (15–18 months) and toddlerhood (24–38 months). Parents reported which words their children could say (expressive vocabulary) and which words they understood (receptive vocabulary) from standardized checklists.
The analysis included vocabulary and genotype data from 17,298 children of European descent, covering 37,913 parent-reported vocabulary measures in English, Danish, and Dutch. Spoken-word counts were available for both infancy and toddlerhood, while understood-word counts were available for toddlers. The team compared these early-life genetic signals to genetic summary statistics for later outcomes—literacy (spelling, reading, phoneme awareness), cognition (general intelligence and years of education), and neurodevelopmental traits such as genetic risk for ADHD and ASD, as well as observed ADHD symptoms in subsets of participants.
“Learning to speak” and “speaking to learn”
The researchers uncovered multiple genetic influences on vocabulary size that shift in importance across the first two years. Both infant and toddler expressive vocabularies showed genetic links with later literacy skills—for example, spelling—while associations with broader cognitive measures (such as intelligence and educational attainment) appeared primarily for toddler vocabulary scores. This pattern supports the view that infants initially focus on learning to speak, whereas toddlers increasingly use language to support higher-level learning and cognitive development.
Intriguingly, the study found that a larger expressive vocabulary in infancy was genetically correlated with increased ADHD risk and more ADHD symptoms. By toddlerhood, however, the genetic relationship changed: smaller receptive vocabulary was associated with more ADHD symptoms. One possible explanation is that infant expressive vocabulary captures early speech-motor and communicative behaviors, while toddler vocabulary more strongly reflects cognitive and learning processes that relate differently to ADHD liability.
St Pourcain notes, “Genetic influences on vocabulary size change rapidly across less than two years. Viewing development dynamically helps us better understand the early causes of both typical and disordered language and cognition.” First author Ellen Verhoef adds that these results highlight the importance of collecting more early-life data: vocabulary measures in the first years can be informative about future behavior and cognitive outcomes.
About this genetics, ADHD, and language research news
Author: Marjolein Scherphuis
Source: Max Planck Institute
Contact: Marjolein Scherphuis – Max Planck Institute
Image credit: Neuroscience News
Original Research: Open access. “Genome-wide Analyses of Vocabulary Size in Infancy and Toddlerhood: Associations With Attention-Deficit/Hyperactivity Disorder, Literacy, and Cognition-Related Traits” by Beate St Pourcain et al., Biological Psychiatry.
Abstract
Genome-wide Analyses of Vocabulary Size in Infancy and Toddlerhood: Associations With Attention-Deficit/Hyperactivity Disorder, Literacy, and Cognition-Related Traits
Background
Expressive vocabulary (words children produce) and receptive vocabulary (words children understand) increase rapidly in early childhood and are influenced in part by genetics. The study performed a meta–genome-wide association analysis of vocabulary acquisition and examined polygenic overlap with literacy, cognition, developmental phenotypes, and neurodevelopmental conditions including ADHD.
Methods
Researchers analysed 37,913 parent-reported vocabulary measures from 17,298 children of European ancestry (English, Dutch, Danish). Meta-analyses targeted three measures: early-phase expressive vocabulary (infancy, 15–18 months), late-phase expressive vocabulary (toddlerhood, 24–38 months), and late-phase receptive vocabulary (toddlerhood, 24–38 months). The team estimated SNP-based heritability and genetic correlations and used multivariate models to characterize underlying genetic factor structures.
Results
Early vocabulary showed modest SNP-based heritability (SNP-h2 = 0.08–0.24). Genetic overlap between infant expressive and toddler receptive vocabulary was minimal (rg = 0.07), while each was moderately correlated with toddler expressive vocabulary (rg = 0.69 and rg = 0.67), indicating a multifactorial genetic architecture. Both infant and toddler expressive vocabularies shared genetic links with literacy (for example, spelling: rg = 0.58 and rg = 0.79). Associations with educational attainment and intelligence emerged more clearly in toddler measures (for example, receptive vocabulary and intelligence: rg = 0.36). Genetic risk for ADHD correlated with larger infant expressive vocabulary (rg = 0.23); multivariate models in the ALSPAC cohort confirmed this link with ADHD symptoms (for example at age 13; rg = 0.54) but showed a reversed association for toddler receptive vocabulary (rg = −0.74), highlighting developmental heterogeneity.
Conclusions
The genetic architecture of early vocabulary changes substantially during the first two years, shaping distinct polygenic association patterns with later ADHD risk, literacy outcomes, and cognition-related traits. These findings emphasize the importance of a developmental perspective when studying the genetics of language and related neurodevelopmental outcomes.