Machine Learning: a New Opportunity for Risk Prediction

Osung Kwon; Wonjun Na; Young-Hak Kim

doi:10.4070/kcj.2019.0314

Globally, cardiovascular disease (CVD) remains the major cause of mortality and morbidity, and identifying people at risk of CVD is the cornerstone of clinical cardiology.1) Accordingly, current guidelines for primary prevention of CVD recommend algorithms to identify asymptomatic patients on the basis of their predicted risk.2)3) These established algorithms are typically developed using multivariate regression models with a limited number of well-established risk factors and generally assume that all such factors are related to the CVD outcomes in a linear fashion, with limited or no interactions between the different factors.4)5) Owing to their restrictive modeling assumptions and limited number of predictors, the existing algorithms generally exhibit suboptimal predictive performance.4)5)

Along with the emergence of big data, machine learning (ML) provides an alternative approach to established prediction modelling that may address the current limitations. Accumulated medical data and digitalized clinical information enable ML to verify a hypothesis generated from a conventional statistical analysis and to agnostically discover new predictors of CVD risk. Recently, deep learning (DL), a branch of ML, has become increasingly popular in the medical research community because of its excellent performance in different domains and the rapid methodological improvements.6) DL represents an improvement in artificial neural networks, consisting of more layers that permit higher levels of abstraction and improved predictions from data.7) To date, it is the leading ML tool in medical image analysis, with promising results.8) By virtue of large training biomedical data and advanced computing power, more recently, DL has been applied to the development of risk prediction models using electronic health data.6)

Based on this background, Cho et al. investigated the additional discriminative accuracy of a time-series DL algorithm using repeated-measures data for identifying people at high risk of CVD, in comparison with the Cox hazard regression model.9) The authors found that the time-series DL algorithm analysis showed greater discriminative accuracy than the Cox model approaches. This study expands the possibility of DL from models that predict outcomes on the basis of data from specific time points to those that predict future events in complex time-varying datasets. The study used large data from a national health screening program and the national health insurance claims database in South Korea for development and validation. Furthermore, prospective cohort data from the Rotterdam Study were used for ethnically generalizable external validation. The design based on this approach abided by the recommendation of the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis statement,10) which made the study results reliable. In addition, ML, particularly neural networks, is sometimes called “black box” because of the difficulty of interpretation. Thus, the authors assessed the attribute rank of risk predictors in the DL model.

The data derived from the hospital information system are characterized by tremendousness, heterogeneity, and complexity, and DL could provide a solution for analyzing these kind of complex data. However, as mentioned by the authors in the limitations, in the study, only 6 variables that are already well-known strong cardiovascular risk factors, were used to develop the risk prediction model, which limits the study value and use of DL. Considering the purpose of the study, which was to confirm the superior analytic performance of DL in contrast to that of Cox regression, the study serves as an attempt to navigate the challenges of DL for developing CV risk prediction models. Further studies using a large number of diverse variables would be required to validate the predictive performance of DL.

DL introduces exciting new opportunities for precision medicine, including risk stratification and future event prediction. Attempts to apply DL methods to patient care and clinical research are already planned or underway. In spite of the current hurdles to the application of DL for cardiac risk stratification, inspiration and dispassionate effort would lead to the development of more reliable and robust models for realizing personalized cardiovascular health care.

Machine Learning: a New Opportunity for Risk Prediction

Notes

References