Summary: Machine learning markedly improves the accuracy of predicting premature all-cause mortality in a middle-aged population compared with traditional epidemiological models.
Source: University of Nottingham
Computers capable of learning from health data can significantly improve predictions of premature death, offering potential to enhance preventive healthcare, according to a new study by researchers at the University of Nottingham.
Researchers, including healthcare data scientists and clinicians, developed and evaluated machine-learning algorithms to predict the risk of early death from chronic disease in a large middle-aged cohort. The study shows these AI-based models outperform conventional prediction methods crafted by human experts. The findings are published in PLOS ONE within a special collection on “Machine Learning in Health and Biomedicine”.
The team analyzed health information from 502,628 participants aged 40–69 enrolled in the UK Biobank between 2006 and 2010 and followed for outcomes through 2016.
Assistant Professor of Epidemiology and Data Science Dr. Stephen Weng, who led the work, said: “Preventive healthcare is increasingly important for reducing the burden of serious disease. We have been working to improve computerized risk assessment for the general population. Predicting death across multiple causes is more complex than forecasting a single disease because environmental and individual factors interact in many ways.”
“Our approach uses machine learning to build holistic risk prediction models that incorporate a broad range of demographic, biometric, clinical and lifestyle variables for each person. For example, the models consider daily dietary patterns such as consumption of fruit, vegetables and meat, along with clinical measures and lifestyle indicators.”
“We validated the predictions against mortality records from national sources, including the Office for National Statistics, the UK cancer registry and hospital episode statistics. Machine-learned algorithms were significantly more accurate than standard prediction models developed using traditional epidemiological approaches.”
The machine-learning methods applied in the study included random forest and deep learning. The researchers compared these against a simple Cox regression model using only age and sex—which proved the least accurate—and a multivariate Cox regression that improved performance but tended to overestimate risk.
Professor Joe Kai, a clinical academic on the project, commented: “There is growing interest in whether AI and machine learning can improve health outcome prediction. In some settings they help, in others they may not. Here, after careful model tuning, these algorithms provided meaningful improvements in accuracy.”
“Machine-learning techniques can be unfamiliar and technically demanding for many health researchers. By reporting methods transparently, we aim to support scientific verification and encourage responsible development of these tools for healthcare.”
This work builds on earlier research by the Nottingham team showing that multiple AI algorithms—random forest, logistic regression, gradient boosting and neural networks—outperformed an established cardiovascular risk algorithm used in clinical guidelines.
The authors anticipate AI will play a key role in next-generation tools for personalized medicine, enabling risk assessments and prevention strategies tailored to individual patients. They note further research is needed to validate these algorithms across diverse populations and to explore practical ways to integrate them into routine clinical care.
Source:
University of Nottingham
Media Contacts:
Stephen Weng – University of Nottingham
Image Source:
The image is in the public domain.
Original Research: Open access.
“Prediction of premature all-cause mortality: A prospective general population cohort study comparing machine-learning and standard epidemiological approaches”
Stephen F. Weng, Luis Vaz, Nadeem Qureshi, Joe Kai.
Published: March 27, 2019 PLOS ONE doi:10.1371/journal.pone.0214365
Abstract
Prediction of premature all-cause mortality: A prospective general population cohort study comparing machine-learning and standard epidemiological approaches
Background
Prognostic modeling with conventional methods is well established for forecasting individual disease risk. Machine learning offers the capacity to model more complex outcomes, such as premature death across multiple causes. This study aimed to develop and compare novel machine-learning algorithms alongside standard survival models to predict premature all-cause mortality.
Methods
A prospective cohort of 502,628 UK Biobank participants aged 40–69 years were recruited from 2006 to 2010 and followed until 2016. Participants were assessed on demographic, biometric, clinical and lifestyle factors. Mortality data coded by ICD-10 were obtained via linkage to the Office for National Statistics. Models included deep learning, random forest and Cox regression. Calibration was evaluated by comparing observed versus predicted risks and discrimination by area under the receiver operating characteristic curve (AUC).
Findings
Over the follow-up period there were 14,418 deaths (2.9%) across 3,508,454 person-years. A simple age-and-sex Cox model was the least predictive (AUC 0.689, 95% CI 0.681–0.699). A multivariate Cox regression improved discrimination (AUC 0.751, 95% CI 0.748–0.767) but tended to over-predict risk. Machine-learning models further enhanced discrimination: random forest achieved AUC 0.783 (95% CI 0.776–0.791) and deep learning AUC 0.790 (95% CI 0.783–0.797). These represent improvements of 9.4% and 10.1%, respectively, over the simple Cox model. Random forest and deep learning performed similarly. Machine-learning models were well calibrated, while Cox models consistently overestimated risk.
Conclusions
In this middle-aged population, machine-learning substantially improved prediction accuracy for premature all-cause mortality compared with standard approaches. The study demonstrates the value of integrating machine learning within traditional epidemiological designs and highlights the importance of transparent reporting to support verification and future development.