Study Finds Medical History May Predict Autism in Young Children

Summary: New machine learning models evaluate hundreds of clinical variables—drawing on doctor visits and de-identified insurance claims for conditions that may seem unrelated—to estimate the likelihood of autism spectrum disorder (ASD) in very young children.

Source: Penn State

Researchers from Penn State report that medical insurance claims, when analyzed with modern machine learning techniques, can help predict which young children are at increased risk for autism spectrum disorder. Their findings appear in BMJ Health & Care Informatics.

The interdisciplinary team built predictive models that examine connections across hundreds of clinical variables. These include routine doctor visits, medical procedures and diagnoses for issues that are not obviously linked to ASD. By integrating this longitudinal health information, the models estimate a child’s risk of receiving an ASD diagnosis between 18 and 30 months of age.

“De-identified insurance claims provide rich, longitudinal data about a patient’s medical history,” said corresponding author Qiushi Chen, assistant professor of industrial and manufacturing engineering at the Penn State College of Engineering. “Previous studies show that children with ASD often have higher rates of certain clinical symptoms—such as infections, gastrointestinal problems, seizures and behavioral signs. Those symptoms do not cause autism, but they frequently coexist with it in early childhood. We sought to combine these signals to quantify risk and improve early detection.”

The team trained machine learning algorithms to identify patterns and correlations among hundreds of features extracted from claims data. Their goal was to flag children at higher risk so clinicians could prioritize earlier evaluations and interventions.

“Diagnosing autism requires observation and multiple screenings by clinicians, a process that can be lengthy,” said co-author Guodong Liu, associate professor of public health sciences, psychiatry and behavioral health, and pediatrics at the Penn State College of Medicine. “Many children do not receive a formal diagnosis until age four or five, missing the critical window when early intervention is most effective.”

The most commonly used screening instrument for toddlers is the Modified Checklist for Autism in Toddlers (M-CHAT), administered at well-child visits around 18 and 24 months. The M-CHAT consists of 20 caregiver-reported questions about behaviors such as eye contact, social interaction and some developmental milestones. Because normal development varies widely at these ages, M-CHAT can produce both false positives and false negatives, delaying accurate diagnosis for many children.

“Our model aggregates multiple identified risk factors to estimate an overall likelihood of ASD,” Chen said. “Its performance is already comparable to—and in some settings slightly better than—the M-CHAT. When used together with the screening checklist, the combined approach offers a promising tool for clinicians.”

Liu added that embedding the prediction model into clinical workflows is practical and feasible. “This informatics approach can be integrated into an electronic health record (EHR) system as a clinical decision support tool. The model could flag children at high risk so clinicians and families can act sooner.”

The research was funded by the National Institutes of Health, the Penn State Social Science Research Institute and the Penn State College of Engineering. It led to a new $460,000 grant awarded to Chen and Whitney Guthrie, clinical psychologist at the Children’s Hospital of Philadelphia Center for Autism Research and assistant professor of psychiatry and pediatrics at the University of Pennsylvania Perelman School of Medicine, by the National Institute of Mental Health.

With this grant, the team will further analyze how well combined hospital record data and screening results predict confirmed autism diagnoses, and will investigate other screening tools and data sources to better equip clinicians for early identification.

“Current screening methods miss many children on the autism spectrum and also produce long waitlists for evaluations,” Guthrie said. “High rates of false positives and false negatives mean some autistic children are missed while others are referred unnecessarily. Both outcomes increase delays for diagnosis and intervention. Pediatricians need more accurate screening strategies to identify all children who require evaluation as early as possible.”

Chen noted that part of the challenge is the limited supply of specialists—psychologists and developmental pediatricians—who can provide formal ASD diagnoses. “Industrial engineering can help make better use of resources,” he said. “By combining clinical expertise with predictive modeling, we aim to deliver a tool that primary care providers can use confidently to identify and refer at-risk children earlier.”

Additional contributors to the paper include first author Yu-Hsin Chen, a doctoral candidate in industrial and manufacturing engineering, who will base her dissertation on this grant-supported work; and co-author Lan Kong, professor of public health sciences at the Penn State College of Medicine.

This shows a stethoscope
Researchers are evaluating how well combined hospital record data and screening questionnaires predict autism diagnoses and exploring additional screening approaches to aid clinicians. Image is in the public domain

About this autism research news

Author: Adrienne Berard
Source: Penn State
Contact: Adrienne Berard – Penn State
Image: The image is in the public domain

Original Research: Open access. “Early detection of autism spectrum disorder in young children with machine learning using medical claims data” by Qiushi Chen et al., BMJ Health & Care Informatics


Abstract

Early detection of autism spectrum disorder in young children with machine learning using medical claims data

Objectives

Early diagnosis and timely intervention significantly improve long-term outcomes for children with autism spectrum disorder (ASD), yet current screening tools lack sufficient accuracy. This study aims to predict the risk of ASD in children aged 18 to 30 months by analyzing their medical histories using real-world health claims data.

Methods

Using the MarketScan Health Claims Database (2005–2016), researchers identified 12,743 children diagnosed with ASD and a random sample of 25,833 children without ASD. They developed logistic regression with least absolute shrinkage and selection operator (LASSO) and random forest models to predict ASD diagnoses at 18–30 months. Predictor variables included demographics, diagnostic codes and healthcare procedures drawn from early-life medical claims.

Results

For predicting ASD diagnosis by 24 months, the logistic regression and random forest models achieved area under the receiver operating characteristic curve (AUROC) values of 0.758 and 0.775, respectively. Predictive accuracy improved with age. When separating predictors by outpatient and inpatient visits, the random forest model reached an AUROC of 0.834 for prediction at 24 months, with 96.4% specificity and a 20.5% positive predictive value at 40% sensitivity—an encouraging improvement over current screening practice.

Conclusions

This study demonstrates that machine learning applied to health claims data can feasibly identify children at elevated risk for ASD at a very early age. The approach shows promise for population-level risk monitoring and for targeting high-risk children for more focused screening and earlier intervention.