Journal List > Acute Crit Care > v.40(2) > 1516092097

Sim, Cho, Kim, Lee, Kim, Hahn, Ha, Yun, Kim, Park, Cho, Yu, Ahn, Jeong, Won, Cho, and Lee: Prospective external validation of a deep-learning-based early-warning system for major adverse events in general wards in South Korea

Abstract

Background

Acute deterioration of patients in general wards often leads to major adverse events (MAEs), including unplanned intensive care unit transfers, cardiac arrest, or death. Traditional early warning scores (EWSs) have shown limited predictive accuracy, with frequent false positives. We conducted a prospective observational external validation study of an artificial intelligence (AI)-based EWS, the VitalCare - Major Adverse Event Score (VC-MAES), at a tertiary medical center in the Republic of Korea.

Methods

Adult patients from general wards, including internal medicine (IM) and obstetrics and gynecology (OBGYN)—the latter were rarely investigated in prior AI-based EWS studies—were included. The VC-MAES predictions were compared with National Early Warning Score (NEWS) and Modified Early Warning Score (MEWS) predictions using the area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), and logistic regression for baseline EWS values. False-positives per true positive (FPpTP) were assessed based on the power threshold.

Results

Of 6,039 encounters, 217 (3.6%) had MAEs (IM: 9.5%, OBGYN: 0.26%). Six hours prior to MAEs, the VC-MAES achieved an AUROC of 0.918 and an AUPRC of 0.352, including the OBGYN subgroup (AUROC, 0.964; AUPRC, 0.388), outperforming the NEWS (0.797 and 0.124) and MEWS (0.722 and 0.079). The FPpTP was reduced by up to 71%. Baseline VC-MAES was strongly associated with MAEs (P<0.001).

Conclusions

The VC-MAES significantly outperformed traditional EWSs in predicting adverse events in general ward patients. The robust performance and lower FPpTP suggest that broader adoption of the VC-MAES may improve clinical efficiency and resource allocation in general wards.

INTRODUCTION

Acute deterioration of hospitalized patients in general wards poses a significant challenge, often resulting in unplanned transfers to intensive care units (ICUs), initiation of cardiopulmonary resuscitation (CPR), or even death before ICU transfer. These major adverse events (MAEs) increase mortality and impose substantial financial burdens on both patients and healthcare systems [1-3]. Previous national surveys indicated that 14%–28% of ICU admissions were unplanned transfers from general wards rather than originating from the emergency department (ED) or operating room [4]. Moreover, patients transferred to the ICU from general wards exhibit higher mortality rates than those admitted directly from the ED or following surgery [3].
The factors contributing to acute deterioration are complex and multifaceted, hindering early prediction and intervention; however, many MAEs are preceded by detectable physiological derangements within the previous 24 hours [5,6]. While some such events are inevitable owing to underlying conditions, many are preventable through closer monitoring and timely medical intervention. Observational studies suggest that early intervention can prevent approximately 44% of in-hospital adverse events and 15% of unplanned ICU transfers (UITs) [4,6].
The rapid response system (RRS) was established in the 1990s [7] to enhance surveillance and proactive interventions aimed at preventing deterioration of patients admitted to the general wards. Subsequently, various early warning scores (EWSs) were developed and validated for use together with the RRS when monitoring patient physiological signs and predicting impending adverse events. Although these scoring systems have demonstrated promise [8], their reliance on generic thresholds often fails to account for individual patient variability, resulting in suboptimal predictive performance [9,10]. In a randomized controlled trial by Haegdorens et al. [11], an RRS utilizing the National Early Warning Score (NEWS) did not significantly reduce the incidence of UITs, cardiac arrest, or unexpected death. Furthermore, concerns have been raised regarding the high false-alarm rates generated by these systems, which can disrupt the RRS workflow [9,12,13].
To address these limitations, there is growing interest in incorporating artificial intelligence (AI) technologies into EWSs [14]. The use of AI-based algorithms in EWS may improve predictive accuracy by analyzing large volumes of clinical data and detecting subtle patterns often overlooked by conventional rule-based methods [15]. These advanced systems offer tailored monitoring by adapting to the unique profiles of individual patients, potentially delivering more precise and timely alerts [16]. However, the performance and impact of AI-based EWSs in real-world clinical settings require validation in prospective clinical studies [17].
We developed an AI-based clinical decision support system, called the VitalCare - Major Adverse Event Score (VC-MAES), to predict in-hospital MAEs, including UITs, cardiac arrest, or death among patients admitted to the general wards. The VC-MAES is generated in real time by analyzing structured data from electronic health records (EHRs), facilitating prompt identification of and intervention for patients at high risk of clinical deterioration.
This study aimed to externally validate the performance of VC-MAES by prospectively collecting real-world medical data. Additionally, we sought to compare its predictive accuracy with those of two widely used and extensively studied traditional EWSs, the NEWS and Modified Early Warning Score (MEWS).

MATERIALS AND METHODS

The study adhered to the ethical guidelines of the 1975 Declaration of Helsinki and was approved by the Institutional Review Board of Keimyung University School of Medicine (No. 2022-12-081), which waived the requirement for informed consent due to the retrospective nature of this study. The study was registered with the Clinical Research Information Service (CRIS) operated by the National Institute of Health under the Korea Disease Control and Prevention Agency (CRIS Registration No. KCT0008466).

Study Design and Setting

This prospective observational external validation study was conducted at Keimyung University Dongsan Hospital in Republic of Korea. The model was implemented in six medical-surgical general wards across two departments: internal medicine (IM) and obstetrics and gynecology (OBGYN). These two departments were selected as our primary aim was to validate the model's performance across markedly different patient populations, rather than simply splitting cases into medical versus surgical categories. Additionally, we sought to validate our AI-based EWS in OBGYN populations that are often underrepresented in such studies. This was a non-interventional study in which the model generated real-time predictions that were neither disclosed to healthcare providers nor used to guide clinical decision-making. The study period was from June 3, 2023, to January 31, 2024, and patients were followed until discharge or April 30, 2024, whichever occurred first.

Patient Selection

Patients were eligible for inclusion if they met the following criteria: (1) 19 years of age or older and (2) had five initial vital signs (systolic blood pressure, diastolic blood pressure, heart rate, respiratory rate, and body temperature) recorded in their EHR during hospitalization. Patients were excluded if they had been admitted to the labor and delivery unit or directly to the ICU from the ED or operating room. All other patients, including those with do-not-resuscitate (DNR) orders, were included.

Data Collection

Patient demographics and clinical information, including vital signs, chief complaints, admission and final diagnoses, code status, start and end times of surgery, medication orders, and laboratory data, were extracted from the EHR. Data addressing ICU admission and discharge times, time of death, and CPR initiation and termination times were also retrieved.

Outcomes and Objectives

The primary outcomes of interest were UITs, cardiac arrest, and death. Cardiac arrest was defined as the initiation of chest compressions, defibrillation, or both, as documented in the EHR [18]. We defined UITs as unplanned when they occurred unexpectedly from general wards, as opposed to pre-arranged events, such as scheduled postoperative admissions or direct transfers from the ED [11,19].

Algorithm: VC-MAES

The VC-MAES is a proprietary predictive system that applies a deep-learning approach to time-series clinical data. Building on a bidirectional long short-term memory (biLSTM) architecture, this binary classification model estimated the likelihood of MAEs in general ward patients 6 hours in advance. The VC-MAES uses two categories of input data: (1) dynamic features derived from hourly time-series data, including vital signs and laboratory test results, and (2) static features. While the biLSTM component processes time-series inputs, the static features are handled by the fully connected layers. The outputs from both networks were subsequently merged and passed through additional classification layers to obtain the final prediction. A schematic of the VC-MAES model architecture is shown in Supplementary Figure 1, and additional information regarding the classification model can be found in Sung et al. [20].
The model was trained using the entire inpatient data, comprising 334,185 hospitalizations of 209,825 adult patients between 2013 and 2017 at Severance Hospital, a 2,454-bed tertiary academic medical center in Seoul, Republic of Korea. The dataset included patients from more than 35 medical and surgical specialties. The model primarily uses five vital signs and patient ages to generate a VC-MAES ranging from 0 to 100, with higher scores indicating a greater risk of MAEs within the next 6 hours. When used prospectively, the score is updated whenever a new input feature is recorded in the EHR, ensuring that it reflects the most recent data. If available, the model can also incorporate optional variables, such as oxygen saturation, Glasgow Coma Scale (GCS), total bilirubin, lactate, creatinine, platelets, pH, sodium, potassium, hematocrit, white blood cell count, bicarbonate, and C-reactive protein, to provide a more comprehensive risk score.
The VC-MAES imputes missing values using the last-observation-carried-forward (LOCF) method. In this study, if no previous values were available, normal values were assigned (Supplementary Table 1). Because mental status assessments were only performed in the ICUs and not in the general wards at this institution, GCS scores were assigned a value of 15 to calculate the VC-MAES, following the model’s missing value imputation method. Similarly, for the NEWS and MEWS calculations, a score of 0 was assigned for the Alert, Voice, Pain, Unresponsive (AVPU) scale, corresponding to an “alert” level of consciousness.

Performance Evaluation and Statistical Analysis

We compared the demographic characteristics of patients in the non-event and event groups using chi-square tests for categorical variables and t-tests or Wilcoxon rank-sum tests for continuous variables, as applicable. The overall accuracy of the VC-MAES in predicting MAEs at different time intervals was assessed using the area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC) analyses. The AUPRC analysis was conducted because our dataset was highly imbalanced and focused on positive cases, measuring precision and recall. A larger AUPRC indicates better performance in detecting the positive class, which is crucial in imbalanced datasets aiming to identify rare events. A bootstrap approach with 1,000 replicate samples drawn with replacement was employed to compare the AUROC and AUPRC values of the predictive models, forming empirical distributions to calculate confidence intervals and derive P-values from the differences in performance metrics. Additionally, to evaluate the threshold-based performance for each relevant NEWS and MEWS cutoff, we calculated the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and number needed to evaluate (NNE). These metrics were used to benchmark the performance of the established models against VC-MAES. Last, we compared the performance of the EWSs while excluding DNR patients to avoid selection bias due to markedly different patient demographics, characteristics, and/or care.
In addition to the time-point analyses, we assessed the model using the patient episode definition [21], in which each hospital admission was treated as a single comprehensive case that drew on one or more measurements of vital signs and other clinical parameters obtained throughout the hospital stay. Specifically, the first recorded VC-MAES, NEWS, and MEWS results upon admission were used, and binary logistic regression was performed to examine the association between these EWSs and both in-hospital MAEs. Furthermore, we investigated whether the initial scores were associated with prolonged length of stay (pLOS) to explore broader clinical utility beyond early deterioration prediction, such as resource allocation and guiding early discharge planning. We defined pLOS as the 75th-percentile length-of-stay threshold for the entire cohort [22,23]. Demographic variables that were significantly associated with MAEs and pLOS in the univariate analyses were included as covariates in the multivariate logistic regression model. All statistical analyses were conducted using R (version 4.4.0) and Python (version 3.11.8). The pROC package in R was used for ROC curve analysis. Statistical significance was set at P<0.05.

RESULTS

Baseline Characteristics

This study initially included 6,478 screened encounters, comprising 4,846 unique patients admitted to the general ward. After applying the exclusion criteria, 439 cases were removed: 423 were directly admitted to the ICU from the ED or operating room, and 16 had insufficient data for score calculation. A flowchart showing the patient selection procedures is provided in Figure 1. Ultimately, 6,039 events (4,447 patients) with data for 272,493 time points were included in the analyses. Of these, 217 encounters (3.6%) experienced MAEs, including 102 UITs, 13 cardiac arrests, and 102 deaths. The majority of MAEs occurred in the IM subgroup, with 207 of 2,177 cases (9.5%) experiencing MAEs. In contrast, only 10 of the 3,862 OBGYN cases (0.26%), including 1,568 obstetric cases, experienced MAEs.
Table 1 summarizes the baseline demographic characteristics and laboratory values of the event and non-event groups. The median age of the study population was 52.0 years. Patients who experienced MAEs had a significantly higher median age of 76.0 years compared to those without MAEs (51.0 years, P<0.001). The sex distribution revealed that 78.90% of the participants were female, largely because nearly 64% of the study population was drawn from the OBGYN department. At admission, VC-MAES, NEWS, and MEWS results were significantly elevated in the MAE group, reflecting more severe clinical presentations (VC-MAES, 6.2 vs. 1.4; NEWS, 3.0 vs. 1.0; and MEWS, 2.0 vs. 1.0; all P<0.001).
In a subanalysis stratified by specialty, the IM subgroup (n=2,177) had a median age of 70.0 years, which was older than that of the overall cohort. Among IM patients who experienced MAEs (n=207), the median age increased further to 78.0 years (P<0.001). Additionally, IM patients with MAEs exhibited significantly elevated baseline VC-MAES, NEWS, and MEWS compared to IM patients without MAEs (VC-MAES, 6.3 vs. 3.9; NEWS, 3.0 vs. 1.0; and MEWS, 2.0 vs. 1.0; all P<0.001).
In contrast, the OBGYN subgroup (n=3,862) consisted predominantly of younger females (median age, 41.0 years), with only 10 MAEs reported. Those who experienced MAEs were significantly older (60.5 years) than their counterparts without MAEs and also presented with significantly higher VC-MAES, NEWS, and MEWS at admission (mean VC-MAES, 4.9 vs. 0.9; NEWS, 2.5 vs.1.0; MEWS, 2.5 vs.1.0; all P<0.001). Baseline demographic characteristics and laboratory values for both the non-event and event groups within the IM and OBGYN cohorts are presented in Supplementary Tables 2 and 3, respectively.

Model Performance

The VC-MAES tool consistently demonstrated superior performance over the NEWS and MEWS, as shown by both AUROC and AUPRC results across all evaluated time points. Six hours prior to the MAEs, the VC-MAES achieved an AUROC of 0.918 (95% CI, 0.912–0.924) and an AUPRC of 0.352 (95% CI, 0.330–0.374). In comparison, the NEWS yielded an AUROC of 0.797 (95% CI, 0.784–0.810) and an AUPRC of 0.124 (95% CI, 0.110–0.139), whereas the MEWS attained an AUROC of 0.722 (95% CI, 0.707–0.737) and an AUPRC of 0.079 (95% CI, 0.069–0.090; bootstrap test, P<0.001) (Figure 2). This trend remained consistent for the 12- and 24-hour intervals preceding the event. A comprehensive comparison of the results is presented in Table 2 and Figure 3.
For individual events, such as UIT, cardiac arrest, or death, the VC-MAES again outperformed both the NEWS and MEWS. For UIT, the VC-MAES achieved an AUROC of 0.890, compared with 0.698 for the NEWS and 0.627 for MEWS (bootstrap test, P<0.001). For prediction of death, the VC-MAES attained an AUROC of 0.994, outperforming both the NEWS (0.937) and the MEWS (0.850; bootstrap test, P<0.001). For cardiac arrest, the VC-MAES demonstrated an AUROC of 0.758 compared with 0.699 for the NEWS and 0.641 for the MEWS (bootstrap test, P<0.001).
In the IM subgroup (n=2,177), the VC-MAES outperformed the NEWS and MEWS, achieving an AUROC of 0.852 (95% CI, 0.841–0.862) and an AUPRC of 0.350 (95% CI, 0.327–0.375). In contrast, the NEWS produced an AUROC of 0.762 (95% CI, 0.748–0.775) and AUPRC of 0.119 (95% CI, 0.104–0.135), whereas the MEWS produced an AUROC of 0.697 (95% CI, 0.682–0.712) and AUPRC of 0.076 (95% CI, 0.067–0.086) (Figure 4A and B).
Similarly, in the OBGYN subgroup (n=3,862), the VC-MAES demonstrated the highest discriminative ability, with an AUROC of 0.964 (95% CI, 0.943–0.982) and an AUPRC of 0.388 (95% CI, 0.303–0.472). In contrast, the NEWS yielded an AUROC of 0.928 (95% CI, 0.891–0.961) and an AUPRC of 0.299 (95% CI, 0.225–0.385), whereas the MEWS attained an AUROC of 0.864 (95% CI, 0.816–0.912) and an AUPRC of 0.196 (95% CI, 0.139–0.267) (Figure 4B).
When excluding the 418 DNR patients, which resulted in 5,472 encounters, the VC-MAES model maintained superior performance with an AUROC of 0.862 (95% CI, 0.848–0.874) and an AUPRC of 0.032 (95% CI, 0.024–0.045). In comparison, the NEWS achieved an AUROC of 0.730 (95% CI, 0.702–0.758) and an AUPRC of 0.026 (95% CI, 0.018–0.036), while the MEWS demonstrated an AUROC of 0.700 (95% CI, 0.674–0.724) and an AUPRC of 0.020 (95% CI, 0.012–0.030).

Threshold-Based Performance and False-Alarm Reduction

Table 3 presents the performance metrics for the VC-MAES, NEWS, and MEWS across the key cutoff values. Compared with MEWS at a cutoff of 3 (specificity: 92.4%, sensitivity: 48.4%, PPV: 4.1%, NNE: 24.4), VC-MAES at a cutoff of 30 (specificity: 94.3%, sensitivity: 65.8%, PPV: 7.1%, NNE: 14.1) demonstrated improved specificity and sensitivity, with a 42% reduction in NNE. In terms of false-positives per true positive (FPpTP; calculated as NNE – 1), this improvement corresponds to a 44% reduction. Similarly, at a higher specificity range (approximately 97%), the MEWS at a cutoff of 4 (sensitivity: 36.5%, PPV: 7.3%, NNE: 13.7) was outperformed by the VC-MAES at a cutoff of 40 (sensitivity: 58.6%, PPV: 11.2%, NNE: 8.9), achieving a 35% reduction in NNE and a 38% reduction in FPpTP.
A similar trend was observed when comparing the VC-MAES and NEWS. At a specificity range of 96–98%, the NEWS at a cutoff of 5 (sensitivity: 45.2%, PPV: 7.8%, NNE: 12.8) was surpassed by the VC-MAES at a cutoff of 50 (sensitivity: 52.4%, PPV: 17.5%, NNE: 5.7), reflecting a 55% reduction in NNE and a 60% reduction in FPpTP. Even at the highest specificity level (≥99%), the NEWS at a cutoff of 7 (sensitivity: 28.5%, PPV: 16.3%, NNE: 6.1) was outperformed by the VC-MAES at a cutoff of 70 (sensitivity: 37.5%, PPV: 39.5%, NNE: 2.5). This represents a 59% reduction in NNE and a 71% reduction in FPpTP.

Associations between Baseline EWSs and MAEs or pLOS

Using the patient episode definition, we performed binary logistic regression with the first recorded (baseline) VC-MAES, NEWS, and MEWS to evaluate their associations with MAEs and pLOS. A pLOS was defined as a stay ≥7 days, corresponding to the 75th-percentile. Baseline VC-MAES showed a strong association with both MAEs and pLOS. Specifically, for MAEs, baseline VC-MAES had an odds ratio (OR) of 2.065 (95% CI, 1.871–2.280), whereas baseline NEWS and baseline MEWS had ORs of 1.051 (95% CI, 1.045–1.057) and 1.064 (95% CI, 1.054–1.073), respectively (Table 4).
After adjusting for age, sex, and body mass index, baseline VC-MAES continued to exhibit the strongest association with both MAEs and pLOS. For MAEs, baseline VC-MAES achieved an adjusted OR of 1.484 (95% CI, 1.333–1.651), surpassing those of the NEWS (adjusted OR, 1.032; 95% CI, 1.026–1.038) and MEWS (adjusted OR, 1.041; 95% CI, 1.032–1.052). Detailed comparisons of ORs and confidence intervals for pLOS are available in Supplementary Table 4.

DISCUSSION

In this prospective observational external validation study, we tested and validated the VC-MAES, a deep-learning-based EWS, in general ward patients, including a substantial proportion of OBGYN patients. Overall, the VC-MAES demonstrated superior predictive performance for MAEs, significantly outperforming both the NEWS and MEWS. Notably, this advantage was also pronounced in the OBGYN subgroup, highlighting its strong discriminative performance in a population seldom investigated in previous AI-based EWS studies. This finding underscores the potential applicability of the VC-MAES in both low- and high-risk patients admitted to the general wards, as demonstrated by its consistent performance in both the OBGYN and IM subgroups. Furthermore, the baseline VC-MAES at admission was a predictor of both MAEs and pLOS, suggesting its potential value as a prognostic and triage tool upon patient arrival.
External validation is essential to confirm the reliability and generalizability of AI models, as they frequently demonstrate diminished performance in external validation settings. This performance degradation often results from overfitting during training or shifts in the distribution of input features, a phenomenon referred to as “dataset shift” [24,25]. Recent external validation studies have demonstrated significant performance degradation in clinical deterioration prediction models [26,27], highlighting the challenges of maintaining model performance across settings and patient demographics. These findings emphasize the importance of rigorous external validation to identify potential biases and guarantee the robustness and generalizability of models across diverse clinical contexts.
The VC-MAES was originally trained using medical records from a tertiary academic medical center in the Republic of Korea. Although the current study was also conducted at a tertiary academic medical center in the Republic of Korea, this hospital was located in a province with distinct healthcare systems and patient demographics. Despite this substantial data shift, VC-MAES maintained strong performance, suggesting its potential for broader applicability across various clinical settings. In our study, the overall prevalence of MAEs was 3.6%, consistent with previously reported ranges of 3%–9% [28]. However, the incidences of MAEs differed significantly for the two subspecialties assessed: 0.26% for the OBGYN services and 9.5% for the IM services. Despite these differences in MAE incidence, the VC-MAES maintained strong performance, achieving an AUROC of 0.964 and an AUPRC of 0.388 in the OBGYN subgroup, respectively, and an AUROC of 0.852 and an AUPRC of 0.350 in the IM subgroup, demonstrating its applicability to both low- and high-risk patient populations. Notably, the VC-MAES achieved the best EWS in the OBGYN subgroup, mirroring findings from a recent multicenter study of an AI-based EWS by Churpek et al. [29], who reported a slightly higher performance among female patients than among male patients (AUROC, 0.844 vs. 0.824) and particularly strong results in obstetric encounters (AUROC, 0.909).
Previous studies have indicated that a significant barrier to healthcare provider adoption of AI-based algorithms is the concern about increased false alarms from continuous monitoring, which could lead to alarm fatigue and workflow disruption [30]. We found that the VC-MAES model could reduce false-positive MAE predictions by up to 71%, improving clinical efficiency by minimizing unnecessary evaluations. By curtailing superfluous alerts, healthcare facilities can optimize resource allocation, enabling providers to focus on critical patient care tasks.
This study has several limitations. First, as a non-interventional, single-center study, it was inherently susceptible to various biases and confounding variables. Nonetheless, the prospective design helped mitigate some biases commonly associated with retrospective validation studies, providing a useful foundation for future interventional research. Second, the validation datasets were derived from a single hospital in the Republic of Korea, which may limit the generalizability of our findings to other countries with different healthcare systems and patient demographics. Additionally, the study population was predominantly female, largely because of the substantial number of patients from the OBGYN department. This sex imbalance also limits the applicability of these results to other clinical settings. Third, although the VC‐MAES outperformed the NEWS and MEWS, the low incidence of cardiac events and overall adverse events in the OBGYN population limited further sub-analyses and more comprehensive verification of the model performance in that subgroup. Fourth, the model performance can be affected by the quality and completeness of EHR data, which vary across institutions. Although the VC‐MAES uses the LOCF method and normal‐value imputation for missing data, this approach may not fully capture the clinical context. For instance, in this study, most GCS scores were missing and were imputed to be normal (GCS=15) to maintain consistency in the analyses, which may have influenced the model’s overall performance. Finally, the study did not include an interventional component; therefore, we were unable to assess the impact of the VC-MAES on clinical workflow, patient outcomes, or physician engagement in a real-world environment.
In conclusion, our study demonstrated that the VC-MAES significantly outperformed traditional scoring systems in predicting MAEs in hospitalized patients. Despite challenges such as data shifts, the VC-MAES maintained strong performance across clinical settings, showing the potential to reduce false-positive predictions of MAEs and enhance patient outcomes. Future research should focus on validating these findings in diverse clinical settings to ensure the robustness and generalizability of the model. Interventional studies are required to assess the real-world impact of the VC-MAES on patient outcomes, clinical workflow, and physician engagement.

KEY MESSAGES

▪ In this prospective external validation study, VitalCare - Major Adverse Event Score (VC-MAES)—a deep-learning-based early warning system—demonstrated superior predictive accuracy for major adverse events compared with traditional early warning scores (National Early Warning Score and Modified Early Warning Score).
▪ Despite the markedly different demographic profiles of general ward patients in internal medicine and obstetrics and gynecology settings—representing high- and low-risk cohorts, respectively— VC-MAES demonstrated high predictive performance, underscoring its potential for broader generalizability.
▪ VC-MAES also reduced false-positives by up to 71%, suggesting the possibility of enhanced clinical efficiency; however, further studies are warranted to confirm its impact on workflow and patient outcomes.

Notes

CONFLICT OF INTEREST

TS, EYC, JHK, KHL, KJK, YJ, SH, JYW, BEA, EY, and KBL are employees of AITRICS. No other potential conflicts of interest relevant to this article were reported.

FUNDING

This study was financially and administratively supported by the Ministry of Health and Welfare, Korea Health Industry Development Institute, and Daegu Metropolitan City.

ACKNOWLEDGMENTS

None.

AUTHOR CONTRIBUTIONS

Conceptualization: TS, JHK, KJK, BEA, HC, KBL. Methodology: TS, EYC, KJK, HC, KBL. Formal analysis: EYC, KHL, EYH, EY, SH, ICK, SHP, CHC, GIY, YJ, JYW. Data curation: EYC, KHL, EYH, EY, SH, ICK, SHP, CHC, GIY, YJ, JYW. Visualization: EYC, SH. Project administration: YJ, JHK, BEA, KJK. Writing – original draft: TS. Writing – review & editing: EYC, JHK, KJK, BEA, HC, KBL, TS, KHL, EYH, EY, SH, ICK, SHP, CHC, GIY, YJ, JYW. All authors read and agreed to the published version of the manuscript.

SUPPLEMENTARY MATERIALS

Supplementary materials can be found via https://doi.org/10.4266/acc.000525.
Supplementary Table 1.
Normal values for missing value imputation
acc-000525-Supplementary-Table-1.pdf
Supplementary Table 2.
Baseline demographic characteristics and laboratory values for both the non-event and event groups within the Internal Medicine cohorts
acc-000525-Supplementary-Table-2.pdf
Supplementary Table 3.
Baseline demographic characteristics and laboratory values for both the non-event and event groups within the Obstetrics and Gynecology cohorts
acc-000525-Supplementary-Table-3.pdf
Supplementary Table 4.
Univariate and multivariate logistic regression analyses of baseline EWSs (VC-MAES, NEWS, MEWS) for predicting prolonged length of stay
acc-000525-Supplementary-Table-4.pdf
Supplementary Figure 1.
A schematic of the VitalCare - Major Adverse Event Score model architecture. LSTM: long short-term memory; DNN: deep neural networks.
acc-000525-Supplementary-Figure-1.pdf

REFERENCES

1. Liu V, Kipnis P, Rizk NW, Escobar GJ. Adverse outcomes associated with delayed intensive care unit transfers in an integrated healthcare system. J Hosp Med. 2012; 7:224–30. DOI: 10.1002/jhm.964. PMID: 22038879.
crossref
2. Sykora D, Traub SJ, Buras MR, Hodgson NR, Geyer HL. Increased inpatient length of stay after early unplanned transfer to higher levels of care. Crit Care Explor. 2020; 2:e0103. DOI: 10.1097/cce.0000000000000103. PMID: 32426745.
crossref
3. Escobar GJ, Greene JD, Gardner MN, Marelich GP, Quick B, Kipnis P. Intra-hospital transfers to a higher level of care: contribution to total hospital and intensive care unit (ICU) mortality and length of stay (LOS). J Hosp Med. 2011; 6:74–80. DOI: 10.1002/jhm.817. PMID: 21290579.
crossref
4. Bapoje SR, Gaudiani JL, Narayanan V, Albert RK. Unplanned transfers to a medical intensive care unit: causes and relationship to preventable errors in care. J Hosp Med. 2011; 6:68–72. DOI: 10.1002/jhm.812. PMID: 21290577.
crossref
5. Le Lagadec MD, Dwyer T. Scoping review: the use of early warning systems for the identification of in-hospital patients at risk of deterioration. Aust Crit Care. 2017; 30:211–8. DOI: 10.1016/j.aucc.2016.10.003. PMID: 27863876.
crossref
6. Cummings BC, Ansari S, Motyka JR, Wang G, Medlin RP Jr, Kronick SL, et al. Predicting intensive care transfers and other unforeseen events: analytic model validation study and comparison to existing methods. JMIR Med Inform. 2021; 9:e25066. DOI: 10.2196/25066. PMID: 33818393.
crossref
7. Lee A, Bishop G, Hillman KM, Daffurn K. The medical emergency team. Anaesth Intensive Care. 1995; 23:183–6. DOI: 10.1177/0310057x9502300210. PMID: 7793590.
crossref
8. Smith ME, Chiovaro JC, O'Neil M, Kansagara D, Quinones A, Freeman M, et al. Early warning system scores: a systematic review [Internet]. Department of Veterans Affairs (US);2014. [cited 2025 Apr 1]. Available from: https://www.ncbi.nlm.nih.gov/pubmed/25506953.
9. Romero-Brufau S, Morlan BW, Johnson M, Hickman J, Kirkland LL, Naessens JM, et al. Evaluating automated rules for rapid response system alarm triggers in medical and surgical patients. J Hosp Med. 2017; 12:217–23. DOI: 10.12788/jhm.2712. PMID: 28411289.
crossref
10. Downey CL, Tahir W, Randell R, Brown JM, Jayne DG. Strengths and limitations of early warning scores: a systematic review and narrative synthesis. Int J Nurs Stud. 2017; 76:106–19. DOI: 10.1016/j.ijnurstu.2017.09.003. PMID: 28950188.
crossref
11. Haegdorens F, Van Bogaert P, Roelant E, De Meester K, Misselyn M, Wouters K, et al. The introduction of a rapid response system in acute hospitals: a pragmatic stepped wedge cluster randomised controlled trial. Resuscitation. 2018; 129:127–34. DOI: 10.1016/j.resuscitation.2018.04.018. PMID: 29679694.
crossref
12. Romero-Brufau S, Huddleston JM, Naessens JM, Johnson MG, Hickman J, Morlan BW, et al. Widely used track and trigger scores: are they ready for automation in practice? Resuscitation. 2014; 85:549–52. DOI: 10.1016/j.resuscitation.2013.12.017. PMID: 24412159.
crossref
13. Holland M, Kellett J. The United Kingdom’s National Early Warning Score: should everyone use it?: a narrative review. Intern Emerg Med. 2023; 18:573–83. DOI: 10.1007/s11739-022-03189-1. PMID: 36602553.
crossref
14. Hong N, Liu C, Gao J, Han L, Chang F, Gong M, et al. State of the art of machine learning-enabled clinical decision support in intensive care units: literature review. JMIR Med Inform. 2022; 10:e28781. DOI: 10.2196/28781. PMID: 35238790.
crossref
15. Greco M, Caruso PF, Cecconi M. Artificial intelligence in the intensive care unit. Semin Respir Crit Care Med. 2021; 42:2–9. DOI: 10.1055/s-0040-1719037. PMID: 33152770.
crossref
16. Salehinejad H, Meehan AM, Rahman PA, Core MA, Borah BJ, Caraballo PJ. Novel machine learning model to improve performance of an early warning system in hospitalized patients: a retrospective multisite cross-validation study. EClinicalMedicine. 2023; 66:102312. DOI: 10.1016/j.eclinm.2023.102312. PMID: 38192596.
crossref
17. Peelen RV, Eddahchouri Y, Koeneman M, van de Belt TH, van Goor H, Bredie SJ. Algorithms for prediction of clinical deterioration on the general wards: a scoping review. J Hosp Med. 2021; 16:612–9. DOI: 10.12788/jhm.3630. PMID: 34197299.
crossref
18. Nolan JP, Berg RA, Andersen LW, Bhanji F, Chan PS, Donnino MW, et al. Cardiac arrest and cardiopulmonary resuscitation outcome reports: update of the Utstein Resuscitation Registry Template for In-Hospital Cardiac Arrest: a consensus report from a Task Force of the International Liaison Committee on Resuscitation (American Heart Association, European Resuscitation Council, Australian and New Zealand Council on Resuscitation, Heart and Stroke Foundation of Canada, InterAmerican Heart Foundation, Resuscitation Council of Southern Africa, Resuscitation Council of Asia). Resuscitation. 2019; 144:166–77. DOI: 10.1161/cir.0000000000000710. PMID: 31536777.
crossref
19. Reese J, Deakyne SJ, Blanchard A, Bajaj L. Rate of preventable early unplanned intensive care unit transfer for direct admissions and emergency department admissions. Hosp Pediatr. 2015; 5:27–34. DOI: 10.1542/hpeds.2013-0102. PMID: 25554756.
crossref
20. Sung M, Hahn S, Han CH, Lee JM, Lee J, Yoo J, et al. Event prediction model considering time and input error using electronic medical records in the intensive care unit: retrospective study. JMIR Med Inform. 2021; 9:e26426. DOI: 10.2196/26426. PMID: 34734837.
crossref
21. Fang AH, Lim WT, Balakrishnan T. Early warning score validation methodologies and performance metrics: a systematic review. BMC Med Inform Decis Mak. 2020; 20:111. DOI: 10.1186/s12911-020-01144-8. PMID: 32552702.
crossref
22. Chen Y, Scholten A, Chomsky-Higgins K, Nwaogu I, Gosnell JE, Seib C, et al. Risk factors associated with perioperative complications and prolonged length of stay after laparoscopic adrenalectomy. JAMA Surg. 2018; 153:1036–41. DOI: 10.1001/jamasurg.2018.2648. PMID: 30090934.
crossref
23. Patel K, Diaz MJ, Taneja K, Batchu S, Zhang A, Mohamed A, et al. Predictors of inpatient admission likelihood and prolonged length of stay among cerebrovascular disease patients: a nationwide emergency department sample analysis. J Stroke Cerebrovasc Dis. 2023; 32:106983. DOI: 10.1016/j.jstrokecerebrovasdis.2023.106983. PMID: 36641949.
crossref
24. Finlayson SG, Subbaswamy A, Singh K, Bowers J, Kupke A, Zittrain J, et al. The clinician and dataset shift in artificial intelligence. N Engl J Med. 2021; 385:283–6. DOI: 10.1056/nejmc2104626. PMID: 34260843.
crossref
25. Cabitza F, Campagner A, Soares F, García de Guadiana-Romualdo L, Challa F, Sulejmani A, et al. The importance of being external: methodological insights for the external validation of machine learning models in medicine. Comput Methods Programs Biomed. 2021; 208:106288. DOI: 10.1016/j.cmpb.2021.106288. PMID: 34352688.
crossref
26. Wong A, Otles E, Donnelly JP, Krumm A, McCullough J, DeTroyer-Cooley O, et al. External validation of a widely implemented proprietary sepsis prediction model in hospitalized patients. JAMA Intern Med. 2021; 181:1065–70. DOI: 10.1001/jamainternmed.2021.2626. PMID: 34152373.
crossref
27. Byrd TF 4th, Southwell B, Ravishankar A, Tran T, Kc A, Phelan T, et al. Validation of a proprietary deterioration index model and performance in hospitalized adults. JAMA Netw Open. 2023; 6:e2324176. DOI: 10.1001/jamanetworkopen.2023.24176. PMID: 37486632.
crossref
28. Wu CL, Kuo CT, Shih SJ, Chen JC, Lo YC, Yu HH, et al. Implementation of an electronic national early warning system to decrease clinical deterioration in hospitalized patients at a tertiary medical center. Int J Environ Res Public Health. 2021; 18:4550. DOI: 10.3390/ijerph18094550. PMID: 33922991.
crossref
29. Churpek MM, Carey KA, Snyder A, Winslow CJ, Gilbert E, Shah NS, et al. Multicenter development and prospective validation of eCARTv5: a gradient-boosted machine-learning early warning score. Crit Care Explor. 2025; 7:e1232. DOI: 10.1097/cce.0000000000001232. PMID: 40138535.
crossref
30. Lambert SI, Madi M, Sopka S, Lenes A, Stange H, Buszello CP, et al. An integrative review on the acceptance of artificial intelligence among healthcare professionals in hospitals. NPJ Digit Med. 2023; 6:111. DOI: 10.1038/s41746-023-00852-5. PMID: 37301946.
crossref

Figure 1.
Flowchart of patient enrollment. ICU: intensive care unit.
acc-000525f1.tif
Figure 2.
Comparison of predictive performance among VitalCare - Major Adverse Event Score (VC-MAES [MAES]), Modified Early Warning Score (MEWS), and National Early Warning Score (NEWS) models within a 6-hour timeframe before adverse events. (A) Receiver operating characteristic curves with area under the curves. (B) Precision-recall curves with area under the curves. AUROC: area under the receiver operating characteristic curve; AUPRC: area under the precision-recall curve.
acc-000525f2.tif
Figure 3.
Comparison of the predictive performance of the VitalCare - Major Adverse Event Score in identifying clinical deterioration or adverse events within a 6- to 24-hour timeframe before adverse events. (A) Receiver operating characteristic curves with area under the curves. (B) Precision-recall curves with area under the curves. AUROC: area under the receiver operating characteristic curve; AUPRC: area under the precision-recall curve.
acc-000525f3.tif
Figure 4.
Comparison of predictive performance among the VitalCare - Major Adverse Event Score (VC-MAES [MAES]), Modified Early Warning Score (MEWS), and National Early Warning Score (NEWS) models within a 6-hour timeframe before adverse events in Internal medicine (IM; A, B) and obstetrics and gynecology (OBGYN; C, D) cohorts. (A) Receiver operating characteristic curves (ROC) with area under the receiver operating characteristic (AUROC) in IM. (B) Precision–recall curves with area under the precision–recall curve (AUPRC) in IM. (C) AUROC in OBGYN. (D) AUPRC in OBGYN.
acc-000525f4.tif
Table 1.
Baseline demographic characteristics and laboratory values of the non-event and event groups
Variable Overall Non-event Event P-value
Number of encounters 6,039 5,822 217 -
Demographics
 Age (yr) 52 (37–68) 51 (37–67) 76 (68–83) <0.001
 Sex <0.001
  Male 1,274 (21.1) 1,146 (19.7) 128 (59.0)
  Female 4,765 (78.9) 4,676 (80.3) 89 (41.0)
Department -
 Internal medicine 2,177 (36.0) 1,970 (33.8) 207 (95.4)
 Obstetrics and gynecology 3,862 (64.0) 3,852 (66.2) 10 (4.6)
Height (cm) 160.0 (155.2–165.0) 160.0 (155.3–165.0) 161.7 (155.0–168.0) 0.033
Weight (kg) 60.8 (53.3–70.7) 61.0 (53.6–71.0) 55.0 (48.0–62.0) <0.001
Body mass index (kg/m2) 23.9 (21.3–26.8) 24.0 (21.4–26.9) 21.5 (18.5–23.8) <0.001
First MAES 1.5 (0.7–3.5) 1.4 (0.7–3.2) 6.2 (3.2–16.0) <0.001
First NEWS 1.0 (0.0–2.0) 1.0 (0.0–2.0) 3.0 (2.0–6.0) <0.001
First MEWS 1.0 (1.0–2.0) 1.0 (1.0–2.0) 2.0 (1.0–4.0) <0.001
Vital sign
 Pulse (/min) 82.3±14.9 82.0±14.6 92.4±18.4 <0.001
 Respiratory rate (/min) 19.6±2.4 19.6±2.1 22.2±5.4 <0.001
 Systolic blood pressure (mm Hg) 124.6±17.3 124.5±17.2 127.7±20.7 0.028
 Diastolic blood pressure (mm Hg) 73.0±11.9 73.0±11.9 73.4±13.5 0.630
 Body temperature (°C) 36.8±0.4 36.8±0.4 37.0±0.6 <0.001
 Saturation point O2 (%) 97.4±2.2 97.5±2.1 96.5±4.3 0.001
Laboratory
 Total bilirubin (mg/dl) 0.5±0.4 0.5±0.4 0.7±0.4 <0.001
 Lactate (mmol/L) 1.6±1.5 1.5±1.3 2.1±1.9 <0.001
 pH 7.4±0.1 7.4±0.1 7.4±0.1 0.014
 Sodium (mmol/L) 136.4±3.9 136.5±3.7 134.5±6.5 <0.001
 Potassium (mmol/L) 4.3±0.5 4.3±0.5 4.4±0.8 0.012
 Creatinine (mg/dl) 1.0±1.1 0.9±0.9 1.9±2.3 <0.001
 Hematocrit (%) 35.6±5.2 35.7±5.2 34.6±6.6 0.015
 White blood cell count (103/µl) 9.1±4.7 9.0±4.5 11.6±7.5 <0.001
 HCO3 (mmol/L) 24.0±5.2 24.1±4.9 23.6±6.0 0.272
 Platelets (103/µl) 233.8±83.1 233.9±81.2 232.8±123.5 0.896
 C-reactive protein (mg/dl) 3.3±6.1 2.8±5.5 9.4±9.0 <0.001

Values are presented as median (interquartile range), number (%), or mean±standard deviation.

MAES (VC-MAES): VitalCare - Major Adverse Event Score; NEWS: National Early Warning Score; MEWS: Modified Early Warning Score; HCO3: bicarbonate.

Table 2.
AUROC and AUPRC comparison results within 6, 12, and 24 hours preceding the event
EWS Within timeframe AUROC (95% CI) AUPRC (95% CI)
MAES 6 0.918 (0.912–0.924) 0.353 (0.330–0.375)
NEWS 6 0.797 (0.784–0.810) 0.124 (0.110–0.139)
MEWS 6 0.722 (0.707–0.737) 0.079 (0.069–0.090)
MAES 12 0.915 (0.910–0.920) 0.337 (0.319–0.355)
NEWS 12 0.777 (0.766–0.788) 0.143 (0.131–0.155)
MEWS 12 0.691 (0.678–0.704) 0.095 (0.085–0.105)
MAES 24 0.908 (0.904–0.912) 0.333 (0.320–0.346)
NEWS 24 0.758 (0.750–0.767) 0.159 (0.149–0.171)
MEWS 24 0.671 (0.660–0.681) 0.112 (0.103–0.120)

AUROC: area under the receiver operating characteristic curve; AUPRC: area under the precision-recall curve; EWS: early warning score; MAES (VC-MAES): VitalCare - Major Adverse Event Score; MEWS: Modified Early Warning Score.

Table 3.
Performance metrics for MAES, NEWS, and MEWS across key cutoffs
EWS Cutoff Metrics (95% CI), %
Sensitivity Specificity PPV NPV F1-score
MAES 30 65.8 (63.5–67.9) 94.3 (94.1–94.4) 7.1 (6.9–7.4) 99.8 (99.7–99.8) 12.9 (12.4–13.3)
40 58.6 (56.3–60.1) 97.9 (96.8–97.0) 11.2 (10.7–11.6) 99.7 (99.7–99.7) 18.8 (18.1–19.4)
50 52.4 (50.0–54.6) 98.3 (98.3–98.4) 17.5 (16.8–18.1) 99.7 (99.7–99.7) 26.2 (25.1–27.2)
70 37.5 (35.3–39.3) 99.6 (99.6–99.6) 39.5 (37.2–41.5) 99.6 (99.6–99.6) 38.5 (36.4–40.4)
NEWS 4 52.3 (50.1–54.7) 93.3 (93.2–93.4) 5.0 (4.8–5.1) 99.7 (99.6–99.7) 9.1 (8.7–9.5)
5 45.2 (42.9–47.5) 96.4 (96.3–96.5) 7.8 (7.4–8.2) 99.6 (99.6–99.6) 13.3 (12.6–14.0)
6 36.9 (34.8–39.3) 98.1 (98.0–98.1) 11.4 (10.8–12.0) 99.6 (99.6–99.6) 17.4 (16.4–18.4)
7 28.5 (26.3–30.4) 99.0 (99.0–99.1) 16.3 (15.1–17.3) 99.5 (99.5–99.5) 20.7 (19.2–22.0)
MEWS 3 48.4 (46.2–50.4) 92.4 (92.3–92.5) 4.1 (3.9–4.2) 99.6 (99.6–99.6) 7.5 (7.2–7.8)
4 36.5 (34.4–38.6) 96.9 (96.8–97.0) 7.3 (6.9–7.7) 99.6 (99.5–99.6) 12.2 (11.5–12.8)
5 26.4 (24.3–28.4) 98.8 (98.7–98.8) 12.5 (11.6–13.4) 99.5 (99.5–99.5) 16.9 (15.7–18.2)

MAES (VC-MAES): VitalCare - Major Adverse Event Score; NEWS: National Early Warning Score; MEWS: Modified Early Warning Score; EWS: early warning score; PPV: positive predictive value; NPV: negative predictive value.

Table 4.
Univariate and multivariate logistic regression analyses of baseline EWSs (MAES, NEWS, MEWS) for predicting major adverse events
EWS Univariate
Multivariate (adjusted for age, sex, and BMI)
OR (95% CI) P-value OR (95% CI) P-value
Baseline MAES 2.065 (1.871–2.280) <0.001 1.484 (1.333–1.651) <0.001
Baseline NEWS 1.051 (1.045–1.057) <0.001 1.032 (1.026–1.038) <0.001
Baseline MEWS 1.064 (1.054–1.073) <0.001 1.041 (1.032–1.052) <0.001

EWS: early warning score; MAES (VC-MAES): VitalCare - Major Adverse Event Score; NEWS: National Early Warning Score; MEWS: Modified Early Warning Score; BMI: body mass index; OR: odds ratio.

TOOLS
Similar articles