DISCUSSION
The present study demonstrated that APACHE II had poor calibration and modest discrimination among critically ill Korean patients. The APACHE II also exhibited poor calibration and poor to modest discrimination in groups divided according to surgical status at ICU admission. At the predicted risk of 50%, the APACHE II had a sensitivity and specificity of 36.6% and 87.4% for hospital mortality, respectively.
Developed in 1985 using a database of North American ICU patients, the APACHE II uses the most extreme values of 12 physiological variables 24 hours after admission to ICU, chronic health status, age, and the Glasgow coma scale to calculate a score.
1 Hospital mortality is predicted using the APACHE II score, principal diagnostic category which had to ICU admission, and also depending on whether or not the patient required emergency surgery.
1 In the original article, no goodness-of-fit testing was reported for calibration, and the aROC for the APACHE II was 0.863.
1
External validation is essential before routine application of any model to a group of subjects that is different from the group originally used for model development.
11 Although the APACHE II has extensively been validated in different regions of the world, it has never been validated in critically ill East Asian patients. It is popular in Korea because of its ease of use and availability, and is often used as a tool to assess severity of illness and to predict mortality of Korean ICU patients.
To our knowledge, this is the first study which attempted to validate the APACHE II score in a large, multicenter, prospectively collected database in East Asia. In this study, the performance of the APACHE II was poor not only in general ICU patients, but also in patients group divided according to surgical status. The results of this study are consistent with studies conducted in other regions of the world that have attempted to validate the APACHE II at various times. Most of the studies published since 1996 showed that the APACHE II had modest to good discrimination and poor calibration in general ICUs.
12-
14 The only exception was found in a study conducted prospectively at a single center in Italy from 1994 to 1997.
15 The study sample consisted mainly of surgical patients, and patients with ICU length of stays <24 hr were excluded. Furthermore, studies performed with both medical and surgical patients, using the APACHE II, show poor predictive power for predicting hospital mortality.
16,
17
There are several reasons for the limited usefulness of the APACHE II score in predicting mortality for critically ill Korean patients today. First, the differences may represent differences between the population from which the original APACHE II sample was derived and current Korean patients. As stated, the APACHE II was developed mainly using patients from North American ICUs who descended from different populations, and have different underlying diseases and different ICU organizational structures from Korean patients. Second, even if the APACHE II had good performance for Koreans in the 1980s when it was developed, it may not be accurate now. Since its development in the 1980s, there have been significant changes in the prevalences of major diseases, diagnostic approaches and therapeutic modalities. For example, therapeutic modalities, such as liver transplantation, CRRT, early goal-directed therapy and low tidal volume ventilation to reduce ventilator-induced lung injury, were not available at that time.
1 Prediction models should periodically be updated to reflect the changes in medical practice and case-mix over time.
6 Third, the diagnostic category used to adjust for mortality in the APACHE II is vague.
1 More and more patients have several concurrent, complex problems, and selecting only one principal diagnostic category may be very difficult. In addition, some patients (e.g., liver transplant recipients) may be grouped with patients with very different prognoses (e.g., elective abdominal surgery) because these kinds of patients were not seen in the ICU at the time when the APACHE II was developed. Fourth, APACHE II includes a lead-time bias. The physiological variables included in any scoring system are all dynamic and can be influenced by multiple factors, including recent and ongoing resuscitation and therapy.
18 The interpretation of outcome prediction scores needs to take this potential bias into account, especially with recent increased emphasis on the importance of early resuscitation. The APACHE II is vulnerable to the Boyd and Grounds effect,
1,
19 which increases the risk of overestimation of predicted mortality. A recently developed scoring system tried to eliminate this effect by using only data collected during the first hour of ICU stay.
In this study, a commonly used cut-off of 50% was used to evaluate the prediction value of the APACHE II for hospital mortality.
10 At this cut-off point, the sensitivity and specificity were 36.6% and 87.4%, respectively. Specificity of the APACHE II for hospital mortality in this cohort was lower than that reported by Capuzzo, et al.
15 If we treat the APACHE II as a diagnostic test, the cut-off point of the APACHE II for hospital mortality in our study can be calculated by using the Youden index, giving a predicted mortality of 22%.
20 At this cut-off point, sensitivity, specificity, positive predictive value and negative predictive value were 73.9%, 63.8%, 33.1% and 91%, respectively.
In this study, although the APACHE II score was found to be associated with hospital mortality in the univariate model, it only showed borderline statistical significance in the multivariate model (p=0.067). This may be due to the fact that components of the APACHE II score include many of the strong outcome predictors of our model, such as age, gender, and comorbidities, although the test for co-linearity did not show significant co-linearity with predicted mortality of the APACHE II for each variable. Another possible reason might be that the APACHE II score is not an accurate predictor of hospital mortality in these days and age in Korean ICU patients, as described above.
One of the limitations of this study is that the data used were drawn from large university hospitals, therefore, the conclusions may not be applicable to patients treated at smaller hospitals in Korea. Also, certain populations that use the ICU (for example, neurosurgery patients or coronary care unit patients), were not represented in this study, thus limiting the applicability of our findings among these patient populations. However, this is the first study that prospectively collected data from nine university-affiliated hospitals and provided a good case-mix of both medical and surgical ICU patients, thus increasing the generalizability of our findings.
In conclusion, the APACHE II prognostic model has poor calibration and modest discriminative power when applied to ICU patients in Korea. Therefore, a new prognostic model that is customized to Korean patients or an updated model is needed.