“Big data” are increasingly being used to conduct research in the field of healthcare [1] as well as artificial intelligence. Laboratory results account for a large proportion of big data in healthcare. As most test results from clinical laboratories are quantitative, big data researchers who are not experts in the field of laboratory medicine often believe that all numerical results are appropriate for research. However, this is not true. Despite the long journey of standardization and harmonization efforts [2-4], a large bias in test results is observed when the same sample is tested in different laboratories. Even for standardized or harmonized test items, big data results may be biased if unreliable test results from certain laboratories are included. Therefore, it is challenging to select reliable research-level, real-world laboratory results, obtained for clinical purposes, for use as secondary data in big data analysis [5].
In this issue of Annals of Laboratory Medicine, Cho, et al. [6] propose a strategy for evaluating the quality of laboratory results suitable for big data research. They analyzed more than 30,000 external quality assessment (EQA) results for seven test items, using commutable frozen human serum pools in the Korean Association of External Quality Assessment Service (KEQAS) program [7]. EQA results from the accuracy-based proficiency testing program, such as HbA1c, creatinine, total cholesterol, and triglyceride, were compared with target values measured using the reference measurement procedure used in certified reference laboratories. EQA results of alpha-fetoprotein and prostate-specific antigen with relevant international standards were compared with mean peer group values. EQA results of cardiac troponin I (cTnI), for which harmonization was still ongoing, were compared with an all-method mean value. The acceptance rates of the EQA results of the seven test items were only 67.5%–100%, 42.9%–100%, and 22.9%–99.5% within the minimum, desirable, and optimum criteria, respectively. The EQA results from the KEQAS participants exhibited significant differences according to the quality grade based on the total error. For example, the mean percentage bias for cTnI results within the optimum, desirable, minimum, and unacceptable criteria was 4.4%, 6.5%, 7.2%, and 46.0%, respectively. Cho, et al. [6] concluded that even test results that passed the EQA acceptance criteria did not guarantee the quality for inclusion in big data. Thus, when constructing laboratory big data, data quality should be evaluated and poor quality data excluded.
Although Cho, et al. [6] did not suggest a detailed evaluation protocol, they highlighted the necessity of evaluating data quality and established a new evaluation model using EQA data. As EQA can only guarantee a laboratory’s performance at a given point in time and big data in healthcare include longitudinal patient records, accumulated EQA results from each laboratory must be analyzed to determine whether they can be included in big data analysis [5]. Further evaluation of other test items is warranted.
In summary, Cho, et al. [6] showed that participants’ EQA results can be used to evaluate laboratory data as a surrogate for real laboratory data. As specialists of laboratory medicine, we should continue to develop appropriate methods for research-level laboratory data quality assessment in the big data era.
REFERENCES
1. Wang L, Alexander CA. 2020; Big data analytics in medical engineering and healthcare: methods, advances and challenges. J Med Eng Technol. 44:267–83. DOI: 10.1080/03091902.2020.1769758. PMID: 32498594.
2. Jeong T, Cho E, Lee K, Lee W, Yun YM, Chun S, et al. 2021; Recent trends in creatinine assays in Korea: Long-term accuracy-based proficiency testing survey data by the Korean Association of External Quality Assessment Service (2011-2019). Ann Lab Med. 41:372–9. DOI: 10.3343/alm.2021.41.4.372. PMID: 33536355. PMCID: PMC7884186.
3. Yoon YA, Lee YW, Kim S, Lee K, Park HD, Chun S, et al. 2021; Standardization status of total cholesterol concentration measurement: analysis of Korean External Quality Assessment Data. Ann Lab Med. 41:366–71. DOI: 10.3343/alm.2021.41.4.366. PMID: 33536354. PMCID: PMC7884189.
4. Nam Y, Lee JH, Kim SM, Jun SH, Song SH, Lee K, et al. 2022; Periodic comparability verification and within-laboratory harmonization of clinical chemistry laboratory results at a large healthcare center with multiple instruments. Ann Lab Med. 42:150–9. DOI: 10.3343/alm.2022.42.2.150. PMID: 34635608. PMCID: PMC8548239.
5. Kim S, Cho EJ, Jeong TD, Park HD, Yun YM, Lee K, et al. 2023; Proposed model for evaluating real-world laboratory results for big data research. Ann Lab Med. 43:104–7. DOI: 10.3343/alm.2023.43.1.104. PMID: 36045065. PMCID: PMC9467825.
6. Cho EJ, Jeong TD, Kim S, Park HD, Yun YM, Chun S, et al. 2023; A new strategy for evaluating the quality of laboratory results for big data research: using external quality assessment survey data (2010-2020). Ann Lab Med. 43:425–33. DOI: 10.3343/alm.2023.43.5.425. PMID: 37080743.
7. Kim S, Lee K, Park HD, Lee YW, Chun S, Min WK. 2021; Schemes and Performance Evaluation Criteria of Korean Association of External Quality Assessment (KEQAS) for Improving Laboratory Testing. Ann Lab Med. 41:230–9. DOI: 10.3343/alm.2021.41.2.230. PMID: 33063686. PMCID: PMC7591290.