The aim of this study was to develop machine learning (ML) and initial nursing assessment (INA)-based emergency department (ED) triage to predict adverse clinical outcome.
The retrospective study included ED visits between January 2016 and December 2017 that resulted in either intensive care unit admission or emergency room death. We trained four classifiers using logistic regression and a deep learning model on INA and low dimensional (LD) INA, logistic regression on the Korea Triage and acuity scale (KTAS) and Sequential Related Organ Failure Assessment (SOFA). We varied the outcome ratio for external validation. Finally, variables of importance were identified using the random forest model's information gain. The four most influential variables were used for LD modeling for efficiency.
A total of 86,304 patient visits were included, with an overall outcome rate of 3.5%. The area under the curve (AUC) values for the KTAS model were 76.8 (74.9–78.6) with logistic regression and 74.0 (72.1–75.9) for the SOFA model, while the AUC values of the INA model were 87.2 (85.9–88.6) and 87.6 (86.3–88.9) with logistic regression and deep learning, suggesting that the ML and INA-based triage system result more accurately predicted the outcomes. The AUC values for the LD model were 81.2 (79.4–82.9) and 80.7 (78.9–82.5) for logistic regression and deep learning, respectively.
We developed an ML and INA-based triage system for EDs. The novel system was able to predict clinical outcomes more accurately than existing triage systems, KTAS and SOFA.
An emergency department (ED) is a complex scene where various diseases and processes are intertwined. Annually, over 4.8 million patients visit EDs in Korea, and 137.8 million visit EDs in the United States [
Triage systems have been developed where demand is greater than supply [
Digitalized triage systems have been introduced to support triage decisions by healthcare providers [
One of the most important aspects of care in an ED is the initial assessment of a patient's condition. Nurses who encounter patients first measure their condition so that they can identify and manage their physical, mental, or social problems [
The aim of this study was to evaluate an ML and INA-based ED triage system to predict adverse clinical outcomes.
This study was a single-center, retrospective study, conducted in an ED of a tertiary academic hospital (a 1,960-bed, university-affiliated hospital located in a metropolitan city with an annual census of 70,000) [
The study subjects were defined as ED visitors from January 1, 2016 to December 31, 2017.
We excluded patients who were non-adult (age <18), were dead on arrival (DOA) or after cardiopulmonary resuscitation (CPR) or injury. Missing lab data were also excluded from Sequential Organ Failure Assessment (SOFA) score calculation. The process of selecting patients is illustrated in
The Institutional Review Board of Samsung Medical Center approved this study. Informed consent was exempted because this was a retrospective, observational, and deidentified study (No. SMC 2018-11-007).
Data were selected from a clinical data warehouse (CDW) detailing age, gender, level of consciousness, route of arrival, method of transportation, weekend, day of works, vital signs (temperature, heart rate, systolic blood pressure, respiratory rate, oxygen saturation), and initial KTAS score by nurse staff. There are two types of KTAS, namely, initial assessment and reassessment of the condition of a patient. We used the initial KTAS for this study. We also used the SOFA score with the partial pressure of oxygen (PaO2), fraction of inspired oxygen (FiO2) for respiration, platelet count for coagulation, bilirubin for liver, mean arterial pressure for the cardiovascular system, Glasgow Coma Scale (GCS) for the central nervous system, and creatinine or urine output for the renal system. The SOFA score calculates the number and severity of dysfunctions in six organ systems (respiration, coagulation, liver, cardiovascular, central nervous system, and renal). Each organ system is assigned a point value from 0 (normal) to 4 (high degree of dysfunction/failure) [
Our primary and composite outcome was mortality in the ED or intensive care unit (ICU) admission. These clinical outcomes were included as a target feature for analysis to build the model.
We built the prediction model to quantify the probability of clinical outcome. A patient's likelihood of outcome may serve as a proxy for acuity which could be comparable with KTAS and SOFA scores.
All data processing and statistical analysis were conducted using R version 3.5.0 software (
We divided the patients into three sets, namely, training, validation, and test sets, for modeling, model parameter tuning, and evaluation, respectively.
Multivariate logistic regression analysis was conducted using R package ‘glm’ to estimate the likelihood of clinical outcomes after adjusting for outcome ratio and other potential factors that can determine which variables had the greatest effect on outcome. Further, an ML method known for good classification was used, namely, deep learning with R package ‘Keras’. The following hyper-parameters were used: number of layers and number of hidden units in deep learning, which were validated and selected using the validation set. Receiver operating characteristic (ROC) curves were generated by varying the thresholds of each model prediction probability. Finally, several models were compared, and the best prediction models were selected based on their area under the ROC curve (AUROC) values. The AUROC value for the model and its confidence interval were expressed with a 95% confidence interval (CI). We also used variable importance plots in random forest to determine which variables affect the results. Chief complaint, age, heart rate, and SpO2 were the most influential factors for predicting clinical outcomes. Those variables were used for low dimensional (LD) modeling for efficiency.
To compare the acuity of the patients, we cut the model likelihood for the individual patients with the KTAS level ratio to make a model-based KTAS with the same ratio. We compared KTAS with ML-based KTAS. In addition to the contingency table comparison, we show a matrix heatmap for comparison of two KTAS.
Because there was a class imbalance in which less than 15% of the total cases were positive, we considered the Synthetic Minority Over-sampling Technique (SMOTE) to solve this imbalance problem [
Descriptive statistics were used for the demographic features and characteristics of the ED visits. Categorical variables are expressed in counts and percentages of the total amount of data available within the database.
The initial data included 145,784 ED visits by individuals aged ≥18 (n = 115,904) and excluded those who were DOA or had died after CPR, cancelled cases (n = 107,434), and injury (n = 88,705). The data were filtered by excluding cases that had missing vital sign information (n = 2,396). Finally, data on 86,309 ED visits were included in the study. There were 51,785 (60.0%) patients in the training set, 17,262 (20.0%) in the validation set, and 17,262 (20.0%) in the testing set.
The distribution of ED patients demographics divided into three groups is shown in
The outcomes among different severity groups were analyzed and compared separately.
The AUROC values for KTAS and SOFA only model were 76.8 (74.9–78.6) and 74.0 (72.1–75.9) with logistic regression, while the AUROC values for the INA model were 87.2 (85.9–88.6) and 87.6 (86.3–88.9) with logistic regression and deep learning, respectively, suggesting that the INA-based triage system result more accurately predicted outcomes. The AUC values for the LD model using the most four influential features were 81.2 (79.4–82.9) and 80.7 (78.9–82.5) with logistic regression and deep learning, respectively, indicating that the efficiency model also outperformed the KTAS and SOFA models (
Varying the outcome ratio from 1 to 3 showed consistent results as shown in
Variable importance plots in random forest were also used to estimate the impact of features. The results are shown in
There were some limitations of this study. First, an external validation was not performed. Patients' characteristics differ among institutions, and additional learning would be needed to generalize the algorithm from this study. To test the performance of the algorithm with a population of varying severity, we used a SMOTE method for over-sampling and under-sampling.
Second, as it is the result of the initial characteristics of the patient visiting the emergency room (ER), it did not reflect the physicians' notes or lab results generated later. However, our study aimed to determine which process the triage system would put the patient into, so the analysis of the information generated later will be done in the next study.
Finally, a cross-sectional study does not reflect medical history. We can set the time window and consider the medical history during the time interval. For example, we can consider the number of hospitalizations, outpatient visits, procedures or surgeries during 2 years. It is necessary to study the model that predicts the outcome by further utilizing the patient's medical history information.
In this study, we developed an ML tool and evaluated its performance in EDs, using KTAS and SOFA as a comparator. Our results show that INA-ML is the most suitable for clinical outcome (ER death or ICU admission) prediction. However, it is in the model type-free result that we can see there was no significant difference from logistic regression or other ML methods.
Once we can estimate the effect of each feature in the model, it can be easily incorporated into electronic medical record system and can immediately calculate a score once common patient information has been put into the system. Our model was designed for clinical decision support, and it is calculated immediately based on the first data recorded on a patient's arrival. It is not intended to totally replace nurse or physician judgement, but to support them by providing the score as they assess a patient.
INA can be used for two purposes. KTAS can be used for measuring the patient proxy severity and INA for common patient information. Additional information can be obtained when we combined these two attributes. The advantages of the combined model over the reference standard KTAS include evidence-based development and decreased reliance on subjective human experience or judgement [
Even though speed and accuracy are the most important factors in ER results, not many studies on the initial information of emergency patients have been conducted. Second, we did not use additional information because there was no tool for this. We can upgrade the triage by making good use of structured data through this study. Third, we did not use only one method [
We have developed an ML and INA-based triage system for EDs. The novel system was able to predict clinical outcomes more accurately than existing triage systems, namely, KTAS and SOFA.
No potential conflict of interest relevant to this article was reported.
Supplementary materials can be found via
Chief complaint frequency table
Varying clinical outcome ratio and AUROC
Variable importance plot from random forest. CC: chief complaint, HR: heart rate, RR: respiratory rate, ER: emergency room, SBP: systolic blood pressure, BT: body temperature, AVPU: alert, voice, pain, unresponsive.
Values are presented as mean ± standard deviation or number (%).
ER: emergency room, OPD: outpatient department, SBP: systolic blood pressure.
aNon-alert: verbal, painful, unconscious.
Values are presented as number (%).
ED: emergency department, KTAS: Korea Triage and Acuity Scale, ICU: intensive care unit.
AUROC: area under the receiver operating characteristic curve, KTAS: Korea Triage and Acute Scale, SOFA: Sequential Organ Failure Assessment, INA: initial nursing assessment, LD INA: low-dimensional initial nursing assessment.