Summary: By jointly analyzing genetic sequencing data from individuals with congenital heart disease and autism, researchers discovered 23 genes associated with congenital heart defects, 12 of which were not previously reported.
Source: PLOS
Scientists have identified nearly two dozen genes that contribute to congenital heart defects by combining genetic data from people born with heart malformations and from people diagnosed with autism.
Hongyu Zhao of Yale University and colleagues developed a new statistical method to analyze genetic information across related conditions. Their approach, described in a paper published November 4 in PLOS Genetics, leverages shared genetic signals between early-onset disorders to increase the power to detect disease-associated genes.
Prior research has shown that several early-onset disorders share risk genes, with evidence coming from studies of de novo mutations—genetic changes that arise spontaneously in a child and are not inherited from either parent. In particular, analyses of de novo mutations have revealed overlaps between congenital heart disease (CHD) and autism spectrum disorder. However, whole-genome and whole-exome sequencing remain costly, and studies that focus on a single condition can be underpowered to detect genes with modest effects.
To address these limitations, the team created an algorithm called M-DATA (Multi-trait De novo mutation Association Test with Annotations). M-DATA jointly analyzes de novo mutation counts from multiple, related traits while incorporating functional annotations, enabling researchers to borrow strength across datasets and improve gene discovery. Applying M-DATA to combined sequencing datasets from individuals with CHD and from autism cohorts, the researchers identified 23 genes associated with congenital heart disease, including 12 genes not previously linked to CHD.

The investigators report that M-DATA outperforms single-trait analyses because it effectively increases the number of informative genomes by combining data from related conditions. Instead of relying on a smaller sample of genomes from one disease alone, the joint approach aggregates mutation information from multiple cohorts, which raises statistical power and enables detection of genes with subtler effects that might otherwise remain hidden.
Beyond identifying additional candidate genes, the joint-analysis strategy provides insights into the shared genetic architecture between CHD and autism. Understanding which genes contribute to both conditions can clarify biological pathways that operate during early development and may point to mechanisms relevant across multiple disorders. Such findings have potential value for prioritizing genes for functional follow-up and for informing future studies aimed at prevention or treatment.
Zhao notes, “By jointly analyzing de novo mutations from congenital heart disease (CHD) and autism, we identified novel genes that may play an important role in explaining the shared genetic etiology of CHD and autism.” This highlights how integrating data across traits can reveal connections that single-trait studies might miss.
Yuhan Xie, the lead student on the project, adds, “As a biostatistics student, it’s very motivating to find what could be meaningful to the patients and their families.” The team emphasizes that method development in biostatistics and human genetics not only advances discovery but also helps translate sequencing data into knowledge that can eventually support clinical research.
About this genetics research news
Author: Hongyu Zhao
Source: PLOS
Contact: Hongyu Zhao – PLOS
Image: The image is credited to Yuhan Xie
Original Research: Open access.
Title: “M-DATA: A statistical approach to jointly analyzing de novo mutations for multiple traits” by Xie Y, Li M, Dong W, Jiang W, Zhao H. PLOS Genetics
Abstract
M-DATA: A statistical approach to jointly analyzing de novo mutations for multiple traits
Recent studies have shown that several early-onset diseases share risk genes, particularly when examined through de novo mutations (DNMs). This observation suggests that integrating information across related traits can improve the ability to identify genes involved in any one trait. Despite this potential, few statistical methods exist to jointly analyze DNMs across multiple diseases.
In response, the authors developed M-DATA (Multi-trait framework for De novo mutation Association Test with Annotations). The framework increases association power by combining DNM counts from correlated traits and by integrating available functional annotations that inform which genes are more likely to be disease-relevant.
M-DATA employs an Expectation-Maximization algorithm to estimate both the degree of shared genetic association between two diseases and the probability that each gene is associated with each trait. In an applied case study, the method jointly analyzed data from congenital heart disease and autism cohorts. The joint analysis identified 23 genes associated with CHD, including 12 novel candidate genes, substantially more than a comparable single-trait analysis. These results offer new insights into CHD etiology and demonstrate the advantages of multi-trait DNM analysis for gene discovery.
By providing a flexible statistical framework for leveraging shared genetic signals across disorders, M-DATA represents a practical tool for researchers studying the genetics of early-onset and developmentally related conditions. Continued application and refinement of such approaches may accelerate gene discovery and deepen our understanding of the biological links among related diseases.