INTRODUCTION
In laboratory medicine, “standardization” refers to the process of aligning methods and measurement procedures to an established set of standards, often set by international bodies [
1]. Such alignment ensures that results from different laboratories are comparable and consistent, regardless of where or how the tests are conducted. In contrast, “harmonization” involves the achievement of equivalent reported values among methods and measurements, using, but not necessarily adhering to, a standard [
1]. Harmonization focuses on minimizing differences in results generated by different laboratories and measurement procedures. The terms standardization and harmonization are frequently used interchangeably, as they share a common ultimate goal, i.e., to deliver laboratory results to stakeholders, such as clinicians and patients, that are comparable among laboratories and over time [
2].
The importance of standardization and harmonization in healthcare is profound. These processes ensure the reliability and comparability of laboratory results, which are crucial for accurate diagnosis, treatment planning, and patient monitoring. Consistent results among laboratories enable healthcare providers to make informed decisions based on globally comparable data.
The International Consortium for Harmonization of Clinical Laboratory Results (ICHCLR) was established to drive the harmonization of results among different measurement procedures [
3]. The ICHCLR prioritizes measurands by medical importance, coordinates the work of different organizations, and stimulates the development of technical and regulatory processes to achieve harmonization. The ICHCLR classifies the status of harmonization into the following categories: active, adequate/maintain, inactive, incomplete, and needed [
4].
To objectively assess the harmonization status of each test, we hypothesized that quantitatively expressing the degree of harmonization would be more useful than qualitatively describing it as a “status,” as done by the ICHCLR. To our knowledge, no previous studies have quantitatively measured the standardization or harmonization of tests. Therefore, we quantitatively assessed the degree of standardization/harmonization of tests used in the real world using external quality assessment (EQA) data from the Korean Association of External Quality Assessment Service (KEQAS).
MATERIALS AND METHODS
This study was exempt from institutional review board (IRB) approval as patient data were not collected (IRB No.: AMC 2023-1040). This study was conducted in accordance with the principles of the Helsinki Declaration and its amendments.
Collection of EQA data for simulation
In KEQAS proficiency tests (PTs), non-commutable QC materials can show a matrix effect in routine clinical chemistry parameters (e.g., electrolytes and proteins). Therefore, we selected tests that use commutable materials to exclude the matrix effect. Accuracy-based (AB)PTs, including HbA1c, creatinine, total cholesterol, HDL-cholesterol, and triglyceride, were the focus of this study. Additionally, tests for three tumor markers, namely alpha-fetoprotein (AFP), carcinoembryonic antigen (CEA), and prostate-specific antigen (PSA), were examined. For each test, a minimum of 12 EQA samples were used, with EQA data collected between January 2021 and December 2022.
Each HbA1c EQA material from a single donor was immediately aliquoted into vials and shipped on the same day at 4°C. The EQA materials for creatinine and lipids were produced with reference to the CLSI guideline C37-A, a reliable standard for creating commutable reference materials [
5]. The EQA materials for tumor markers were prepared by spiking high-concentration patient samples into pooled serum from fresh frozen plasma and were evaluated and verified for commutability [
6]. The data, which were divided into peer and sub-peer groups, adhered to the classification system established by KEQAS [
7]. In KEQAS, for general chemistry, peer groups are based on the same methods and are further divided into reagent manufacturer-based sub-peer groups. For tumor markers, peer groups are based on laboratories using analyzers from the same manufacturer and are further subdivided into instrument- or reagent-based sub-peer groups.
Calculation of bias%, CV%, and total analytical error (TAE)%
According to the KEQAS evaluation guidelines, groups (peer or sub-peer) with less than 10 participating institutions or those with less than eight institutions post-outlier removal were omitted from the analysis; this is because KEQAS does not compute averages for these groups [
7].
To calculate the average bias of peer groups or individual tests, the true value was employed for ABPTs. For non-ABPTs without a true value, calculations were performed using both the sub-peer group mean and the peer group mean. The peer group mean represents the average of the sub-peer group means within that peer group, whereas the overall mean is the average of the peer group means among all tests. In this study designed to observe harmonization among members of a peer group, we did not use the overall mean when there was a dominant sub-peer group, as doing so might have led to distorted results.
The CV% for each EQA sample was determined based on the results from the participating institutions, with the values averaged among the 12 samples. The peer group CV% was defined as the mean of the subgroup CV% within the peer group, and the overall CV% as the mean of the sub-peer group CV% for the entire tests.
The real-world (RW)-TAE% was calculated using the derived bias% and CV%, as follows: RW-TAE%=|bias%|+1.65*CV%. The factor 1.65 (one-sided estimate) implies that 95% of the results fall within the total allowable error (TEa) limit [
8].
Calculation of the real-world harmonization index (RWHI) and comparison with the ICHCLR harmonization status
The RWHI was calculated using the following formula: RWHI= (RW-TAE%)/(biological variation [BV]-based TEa%).
The European Federation of Clinical Chemistry and Laboratory Medicine (EFLM) BV-based minimum, desirable, and optimal (TEa%) levels for the measurements were retrieved from the EFLM BV database on December 21, 2023 [
9]. Within these categories, “optimal” indicates no need for further assay improvement, “desirable” indicates satisfactory performance, and “minimal” indicates room for assay improvement [
10]. The factors for optimum and minimum performance specifications are arbitrarily set to 1/2 and 3/2 of the desirable, respectively (in cases of imprecision, the factor is 0.25 for optimal, 0.5 for desirable, and 0.75 for minimum; in terms of bias, the values are 0.125 for optimal, 0.25 for desirable, and 0.375 for minimum) [
9].
When the minimum, desirable, and optimal TEa% values were used as the denominator, the results were classified as minimum, desirable, and optimal RWHI, respectively. Minimum, desirable, and optimal RWHIs of ≤1 were arbitrarily considered to reflect the achievement of the corresponding harmonization levels. We compared the minimum, desirable, and optimal RWHI values with the harmonization status provided by the ICHCLR.
DISCUSSION
Artificial intelligence (AI) can play a crucial role in modern healthcare, particularly in improving disease diagnosis, optimizing treatment, predicting prognosis and outcomes, developing drugs, and improving public health [
11]. In the era of AI, big data are crucial and play a pivotal role in the advancement of AI technologies [
12]. Big data facilitate effective model training and the discovery of complex patterns in diverse health datasets, thereby improving its accuracy in diagnosing conditions, identifying health trends, and creating personalized treatment plans.
Effective AI depends on access to substantial amounts of high-quality data; the source, size, and quality of the data can significantly influence the development of AI models [
13]. Laboratory medicine is an essential element of the healthcare system, and laboratory data significantly contribute to objective clinical data, playing an integral role in numerous clinical decisions [
14]. Therefore, ensuring standardized/harmonized laboratory data in big data, including structure, coding, and results, is essential for the effectiveness of AI in healthcare [
15-
17]. Harmonization enables clinical interoperability and supports machine learning using various platforms [
18]. Non-harmonized tests should be interpreted separately to avoid misinterpretation [
19]. Incorrect harmonization assumptions risk patient safety, and missing data hinder public health analysis [
18].
To our knowledge, no previous studies have quantitatively measured the standardization/harmonization of tests to reflect their actual level of harmonization. Therefore, we used KEQAS EQA data to assess the degree of standardization/harmonization of laboratory tests in real-world settings. The differences with the existing ICHCLR harmonization levels [
4] are provided in
Table 2. Among the tests labeled “adequate/maintain” by the ICHCLR, including HbA1c, creatinine, AFP, total cholesterol, and triglyceride, the latter three achieved optimal harmonization. AFP reached a desirable level, creatinine achieved the minimum level, and the harmonization level of HbA1c was below the minimum. HDL-cholesterol, classified as “incomplete” by the ICHCLR, had a desirable harmonization level. CEA and PSA, labeled “needed” for harmonization in ICHCLR, had optimal and desirable levels, respectively.
The ICHCLR categorization is based on a comprehensive assessment, including evaluations of inter-laboratory CV%, inter-method CV%, and/or bias%, primarily using data from EQA programs. The assessment also considers the availability of reference materials, the presence of reference measurement procedures, and the traceability of commercially available calibrators. However, because of the various evaluation methods for each test used by the ICHCLR and the absence of a predefined schedule for timely data updates, assessing the current status of harmonization remains challenging. The harmonization status from the ICHCLR does not integrate and reflect the test methods used in the real world and does not allow quantitative comparison of the degree of standardization or harmonization of each test.
The discrepancy between our results and the ICHCLR harmonization status may be due to three factors. First, the ICHCLR evaluation criteria, such as the availability of reference materials and traceability assessment of calibrators, were not used in our study. Second, we used bias% and CV%, whereas the ICHCLR subjectively used overall EQA data, and we employed a quantitative average from peer group data. Finally, while the ICHCLR criteria were based on global data, we only used EQA results from Korea.
KEQAS differently classifies peer and sub-peer groups according to the tests. In ABPT tests, the peer group represents the test method, whereas the sub-peer group corresponds to the reagent manufacturer. In contrast, for tumor markers, the peer group represents the instrument manufacturer, and the sub-peer group pertains to the model of the instrument from the same manufacturer.
In the ABPT tests, which included HbA1c, creatinine, and HDL-cholesterol, variations were noted in the harmonization levels among peer groups. For HbA1c, Arkray, Bio-Rad, and Tosoh, which use high-performance liquid chromatography, had RWHI values between 1.1 and 1.3. In contrast, Abbott, Hitachi, and i-SENS, which use an enzymatic method, had RWHI values ranging from 1.3 to 2.5. Roche and SD BIOSENSOR, which use an immunoassay, had an RWHI ranging from 1.6 to 2.5. Although none of the methods met the minimum harmonization level, the high-performance liquid chromatography method can be considered better harmonized than the other test methods. Therefore, this method is a valuable reference during the selection of a test instrument. The creatinine test generally met the minimum harmonization level; however, a considerable variation, ranging from falling below the minimum level to achieving a desirable harmonization level, was found depending on the peer group. Therefore, laboratories using lower harmonization-level creatinine test methods should consider switching to higher harmonization-level test methods or opting for products with improved harmonization levels within the same test method. In terms of the HDL-cholesterol test, all peer groups, except the enzymatic with immune inhibition peer group at the minimum harmonization level, achieved a desirable harmonization level. Further studies are needed to investigate the underlying reasons for the differences among the test methods.
In KEQAS, the tumor marker peer group consists of four companies: Abbott, Beckman Coulter, Roche, and Siemens. AFP and PSA achieved a desirable harmonization level, with the Abbott peer group excelling in AFP and the Roche group excelling in PSA. A high harmonization level within a peer group indicates strong consistency in test results among devices, indicating robust harmonization, which may be associated with the types of instrument platforms and reagents. In the Roche peer group, a single platform (cobas e series) was employed. In the Abbott peer group, two platforms (Alinity I and Architect) were employed, but the reagents were the same. In contrast, Siemens had two platforms (Centaur and Atellica IM), as did Beckman Coulter (Access2 and UniCel DxI800).
The recent proposal of Test Result Harmonization Status for the United States Core Data for Interoperability (USCDI) by the College of American Pathologists, though not fully integrated into the USCDI, can serve as an important criterion for future big data applications [
18]. Our quantitative approach to expressing harmonization levels may serve as the foundation. Currently, the RWHI calculations depend on TEa derived from biological variation. Therefore, to enhance interoperability with big data, future TEa determinations should be based on big data standards.
This study focused on eight test items using commutable materials without matrix effects among PT items with long-term data. Future research opportunities may arise if other EQA providers release results based on commutable materials for additional tests. The RWHI developed in our study is based on test methods and instrument results used in Korea. When EQA data from other countries are analyzed, different outcomes reflecting the harmonization level of each country are expected because of differences in the test methods and equipment used in those countries.
In conclusion, we introduced the concepts of real-world TAE% and RWHI and quantitatively assessed the degree of harmonization for eight tests using KEQAS EQA data. This index can be used to assess interoperability in future big data analyses as it reflects the actual harmonization level of laboratory tests in the field.