Summary: Researchers at the University of California San Diego School of Medicine applied AI-driven analysis to electronic health records and genomic data to discover hundreds of genes linked to tobacco use disorder (TUD) and to nominate many drug candidates for potential treatment. By combining large-scale clinical records with genome-wide association methods, the team pinpointed genetic variations that appear to influence the transition from occasional tobacco use to chronic, disordered use—a global problem affecting roughly 1.3 billion people.
Their work demonstrates the value of electronic health records as a rich resource for genetic discovery and for accelerating the search for new treatment strategies. The findings provide fresh biological insights into TUD and highlight therapeutic targets that merit laboratory and clinical follow-up.
Key Facts:
- The meta-analysis identified 461 candidate genes associated with tobacco use disorder, most of which are expressed in the brain, indicating a substantial genetic component to the condition.
- By integrating artificial intelligence with genome-wide association studies, researchers accelerated gene discovery across diverse populations, improving efficiency and reducing the time and cost typical of traditional approaches.
- The study combined data from nearly 900,000 people across multiple biobanks and health systems through the PsycheMERGE Network, emphasizing the scale and potential public health impact of the findings.
Source: UCSD
Overview: Investigators led by Sandra Sanchez-Roige, Ph.D., associate professor in the Department of Psychiatry at UC San Diego School of Medicine, used electronic health records (EHRs) linked to genetic data to investigate the biology of tobacco use disorder. Published April 17, 2024, in Nature Human Behavior, the study leveraged a multi-ancestry meta-analysis to identify genetic risk factors and to probe links between TUD and a range of medical and psychiatric outcomes.
“Tobacco use disorder has an enormous impact on public health,” said Sanchez-Roige. “Developing new treatments has been difficult because the genetic drivers of TUD are not well understood. By using EHR-derived diagnoses and genome-wide methods across diverse populations, we can capture the genetics that influence the progression from use to disorder.”
The World Health Organization estimates approximately 1.3 billion tobacco users globally, with most living in low- and middle-income countries. Tobacco-related harms are wide-reaching: more than 8 million deaths per year are attributable to tobacco, including roughly 1.3 million deaths among non-smokers exposed to secondhand smoke.
Previous genomic studies have identified variants associated with nicotine consumption, but those findings do not fully explain why a subset of users develop a clinical disorder. This study focused on the official diagnostic criteria for tobacco use disorder—behaviors such as using more tobacco than intended, unsuccessful attempts to quit, or continued use despite adverse consequences—to identify genetic contributors to the disorder itself.
Using data from four U.S. biobanks and the UK Biobank—totaling 898,680 individuals spanning European, African American, and Latin American ancestries—the team conducted a genome-wide association meta-analysis. They discovered 88 independent genomic loci and, through functional genomic integration, nominated 461 potential risk genes. Many of these genes show strong expression in brain tissue and overlap with genetic signals for psychiatric traits, externalizing behaviors in children, and multiple medical outcomes including HIV infection, cardiovascular disease, and chronic pain.
The investigators also cross-validated known genetic associations with smoking behaviors, which strengthened confidence in the EHR-based approach. Importantly, the genetic profile of TUD was found to be correlated with both smoking-related measures and broader psychiatric and medical traits, suggesting shared biological pathways that may influence risk across conditions.
Beyond genetic discovery, the study used its findings to prioritize hundreds of existing compounds as potential therapeutic candidates. While these drug suggestions are preliminary and require rigorous laboratory validation and clinical testing, they offer promising starting points for developing better interventions for tobacco dependence.
This research highlights a broader message for the genetics field: electronic health records represent an underused, rapidly expanding source of real-world phenotypic data. “Medical records contain a wealth of information that accumulates during routine care,” Sanchez-Roige noted. “Organizing and analyzing EHR data is challenging, but doing so can reveal insights that traditional study designs miss.”
Funding: This work received support from the California Tobacco-Related Disease Research Program (grants T29KT0526 and T32IR5226) and the National Institute on Drug Abuse (grant DP1DA054394).
About this genetics, AI, and tobacco addiction research news
Author: Miles Martin
Source: UCSD
Contact: Miles Martin – UCSD
Image: Image credited to Neuroscience News
Original Research: Closed access. “Multi-ancestry meta-analysis of tobacco use disorder identifies 461 potential risk genes and reveals associations with multiple health outcomes” by Sandra Sanchez-Roige et al., Nature Human Behavior.
Abstract (revised summary):
Multi-ancestry meta-analysis of tobacco use disorder identifies 461 potential risk genes and reveals associations with multiple health outcomes
Tobacco use disorder (TUD) is the most common substance use disorder worldwide. While genome-wide association studies have identified variants linked to nicotine consumption, fewer findings have focused on diagnoses of TUD. This study combined EHR-derived TUD phenotypes across multiple U.S. biobanks and the UK Biobank to perform a multi-ancestry meta-analysis in 898,680 individuals. The analysis identified 88 independent risk loci and, using functional genomic tools, prioritized 461 candidate risk genes predominantly expressed in brain tissue. Genetic correlations emerged between TUD, smoking behaviors, psychiatric traits, childhood externalizing behaviors, and hundreds of medical outcomes such as HIV infection, heart disease, and chronic pain. These results expand the biological understanding of TUD and support the use of electronic health records as a valuable source for genetic studies of substance use disorders.