Journal List > J Korean Med Assoc > v.61(12) > 1109396

Park, Do, Choi, Sim, Yang, Eo, Woo, Lee, Jung, and Oh: Principles for evaluating the clinical implementation of novel digital healthcare devices

Abstract

With growing interest in novel digital healthcare devices, such as artificial intelligence (AI) software for medical diagnosis and prediction, and their potential impacts on healthcare, discussions have taken place regarding the regulatory approval, coverage, and clinical implementation of these devices. Despite their potential, ‘digital exceptionalism’ (i.e., skipping the rigorous clinical validation of such digital tools) is creating significant concerns for patients and healthcare stakeholders. This white paper presents the positions of the Korean Society of Radiology, a leader in medical imaging and digital medicine, on the clinical validation, regulatory approval, coverage decisions, and clinical implementation of novel digital healthcare devices, especially AI software for medical diagnosis and prediction, and explains the scientific principles underlying those positions. Mere regulatory approval by the Food and Drug Administration of Korea, the United States, or other countries should be distinguished from coverage decisions and widespread clinical implementation, as regulatory approval only indicates that a digital tool is allowed for use in patients, not that the device is beneficial or recommended for patient care. Coverage or widespread clinical adoption of AI software tools should require a thorough clinical validation of safety, high accuracy proven by robust external validation, documented benefits for patient outcomes, and cost-effectiveness. The Korean Society of Radiology puts patients first when considering novel digital healthcare tools, and as an impartial professional organization that follows scientific principles and evidence, strives to provide correct information to the public, make reasonable policy suggestions, and build collaborative partnerships with industry and government for the good of our patients.

REFERENCES

1. Korea Health Industry Development Institute. Medical device weekly newsletter [Internet]. Cheongju: Korea Health Industry Development Institute;2018. [cited 2018 Aug 10]. Available from:. http://www.khidi.or.kr/newsLetter/preView?newsLetterId=464.
2. US Food and Drug Administration. FDA permits marketing of artificial intelligence-based device to detect certain diabetes-related eye problems [Internet]. Silver Spring: US Food and Drug Administration;2018. [cited 2018 Aug 10]. Available from:. https://www.fda.gov/newsevents/newsroom/pressannouncements/ucm604357.htm.
3. Bluemke DA. Radiology in 2018: are you working with AI or being replaced by AI? Radiology. 2018; 287:365–366.
crossref
4. Park SH, Han K. Methodologic guide for evaluating clinical performance and effect of artificial intelligence technology for medical diagnosis and prediction. Radiology. 2018; 286:800–809.
5. Peterson ED, Harrington RA. Evaluating health technology through pragmatic trials: novel approaches to generate high-quality evidence. JAMA. 2018; 320:137–138.
6. The Lancet. Is digital medicine different? Lancet. 2018; 392:95.
7. AI diagnostics need attention. Nature. 2018; 555:285.
8. Park SH. Artificial intelligence in medicine: Beginner's guide. J Korean Soc Radiol. 2018; 78:301–308.
crossref
9. Chartrand G, Cheng PM, Vorontsov E, Drozdzal M, Turcotte S, Pal CJ, Kadoury S, Tang A. Deep learning: a primer for radiologists. Radiographics. 2017; 37:2113–2131.
crossref
10. Choy G, Khalilzadeh O, Michalski M, Do S, Samir AE, Piany-kh OS, Geis JR, Pandharipande PV, Brink JA, Dreyer KJ. Current applications and future impact of machine learning in radiology. Radiology. 2018; 288:318–328.
crossref
11. Yamashita R, Nishio M, Do RKG, Togashi K. Convolutional neural networks: an overview and application in radiology. Insights Imaging. 2018 Jun 22. [Epub].https://doi.org/10.1007/s13244-018-0639-9.
crossref
12. Lee M, Ahn J. The current status and future direction of Korean health technology assessment system. J Korean Med Assoc. 2014; 57:906–911.
crossref
13. Park SH. Regulatory approval versus clinical validation of artificial intelligence diagnostic tools. Radiology. 2018; 288:910–911.
crossref
14. US Food and Drug Administration. Digital health software precertification (Pre-Cert) program [Internet]. Silver Spring: US Food and Drug Administration;2018. [cited 2018 Aug 10]. Available from:. https://www.fda.gov/medicaldevices/digitalhealth/digitalhealthprecertprogram/default.htm.
15. Ministry of Food and Drug Safety. Guideline for regulatory approval · evaluation of medical devices using big-data and artificial intelligence technology (guide for civilians) [Internet]. Cheongju: Ministry of Food and Drug Safety;2017. [cited 2018 Aug 10]. Available from:. https://www.mfds.go.kr/brd/m_210/view.do?seq=13523.
16. Ministry of Food and Drug Safety. Guideline for evaluation of clinical efficacy of medical devices using artificial intelligence (guide for civilians) [Internet]. Cheongju: Ministry of Food and Drug Safety;2017. [cited 2018 Aug 10]. Available from:. https://www.mfds.go.kr/brd/m_210/view.do?seq=13613.
17. US Food and Drug Administration. National Evaluation System for Health Technology (NEST) [Internet]. Silver Spring: US Food and Drug Administration;2018. [cited 2018 Aug 10]. Available from:. https://www.fda.gov/aboutfda/centersoffices/officeofmedicalproductsandtobacco/cdrh/cdrhreports/ucm301912.htm.
18. Park SH, Kressel HY. Connecting technological innovation in artificial intelligence to real-world medical practice through rigorous clinical validation: what peer-reviewed medical journals could do. J Korean Med Sci. 2018; 33:e152.
crossref
19. Allen B. 2018 Data Science Summit: the economics of artificial intelligence in healthcare [Internet]. Reston: American College of Radiology Data Science Institute;2018. [cited 2018 Aug 10]. Available from:. https://www.acrdsi.org/-/media/DSI/Files/2018-Summit-Presentations/Regulatory-Payment-and-EcosystemAllen.pdf?la=en.
20. Luo W, Phung D, Tran T, Gupta S, Rana S, Karmakar C, Shilton A, Yearwood J, Dimitrova N, Ho TB, Venkatesh S, Berk M. Guidelines for developing and reporting machine learning predictive models in biomedical research: a multidisciplinary view. J Med Internet Res. 2016; 18:e323.
crossref
21. Hosny A, Parmar C, Quackenbush J, Schwartz LH, Aerts HJWL. Artificial intelligence in radiology. Nat Rev Cancer. 2018; 18:500–510.
crossref
22. Shah ND, Steyerberg EW, Kent DM. Big data and predictive analytics: recalibrating expectations. JAMA. 2018; 320:27–28.
23. AlBadawy EA, Saha A, Mazurowski MA. Deep learning for segmentation of brain tumors: impact of cross-institutional training and testing. Med Phys. 2018; 45:1150–1158.
crossref
24. Yuille AL, Liu C. Deep Nets: what have they ever done for vision? [Internet]. Ithaca: arXiv.org;2018. [cited 2018 Aug 10]. Available from:. https://arxiv.org/abs/1805.04025.
25. Lee JG, Jun S, Cho YW, Lee H, Kim GB, Seo JB, Kim N. Deep learning in medical imaging: general overview. Korean J Radiol. 2017; 18:570–584.
crossref
26. Zech JR, Badgeley MA, Liu M, Costa AB, Titano JJ, Oermann EK. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLoS Med. 2018; 15:e1002683.
crossref
27. Sica GT. Bias in research studies. Radiology. 2006; 238:780–789.
crossref
28. US Food and Drug Administration. De Novo classification request for IDx-DR [Internet]. Silver Spring: US Food and Drug Administration;2018. [cited 2018 Aug 10]. Available from:. https://www.accessdata.fda.gov/cdrh_docs/reviews/DEN180001.pdf.
29. US Preventive Services Task Force. Bibbins-Domingo K, Grossman DC, Curry SJ, Davidson KW, Epling JW Jr, García FAR, Gillman MW, Harper DM, Kemper AR, Krist AH, Kurth AE, Landefeld CS, Mangione CM, Owens DK, Phillips WR, Phipps MG, Pignone MP, Siu AL. Screening for colorectal cancer: US Preventive Services Task Force Recommendation Statement. JAMA. 2016; 315:2564–2575.
30. Kim YJ, Lee WP. The process by which new health technology is listed for insurance coverage. J Korean Med Assoc. 2014; 57:927–933.
crossref
31. Fenton JJ. Is it time to stop paying for computer-aided mammography? JAMA Intern Med. 2015; 175:1837–1838.
crossref
32. Kohli A, Jha S. Why CAD failed in mammography. J Am Coll Radiol. 2018; 15:535–537.
crossref
33. Lehman CD, Wellman RD, Buist DS, Kerlikowske K, Tosteson AN, Miglioretti DL. Breast Cancer Surveillance Consortium. Diagnostic accuracy of digital screening mammography with and without computer-aided detection. JAMA Intern Med. 2015; 175:1828–1837.
crossref
34. Choi YS. For president Moon's pledge to ease regulations on medical devices to succeed [Internet]. [place unknown]: Yoon Sup Choi's Healthcare Innovation;2018. [cited 2018 Aug 10]. Available from:. http://www.yoonsupchoi.com/2018/07/22/president-regulation.
35. Gulshan V, Peng L, Coram M, Stumpe MC, Wu D, Naraya-naswamy A, Venugopalan S, Widner K, Madams T, Cuadros J, Kim R, Raman R, Nelson PC, Mega JL, Webster DR. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 2016; 316:2402–2410.
crossref
36. Ting DSW, Cheung CY, Lim G, Tan GSW, Quang ND, Gan A, Hamzah H, Garcia-Franco R, San Yeo IY, Lee SY, Wong EY, Sabanayagam C, Baskaran M, Ibrahim F, Tan NC, Finkelstein EA, Lamoureux EL, Wong IY, Bressler NM, Sivaprasad S, Varma R, Jonas JB, He MG, Cheng CY, Cheung GC, Aung T, Hsu W, Lee ML, Wong TY. Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. JAMA. 2017; 318:2211–2223.
crossref
37. INFANT Collaborative Group. Computerised interpretation of fetal heart rate during labour (INFANT): a randomised controlled trial. Lancet. 2017; 389:1719–1729.
38. K2 Medical Systems. K2 INFANT-Guardian [Internet]. Plymouth: K2 Medical Systems [cited 2018 Aug 10]. Available from:. https://www.k2ms.com/infant/default.aspx.
39. Steinhubl SR, Waalen J, Edwards AM, Ariniello LM, Mehta RR, Ebner GS, Carter C, Baca-Motes K, Felicione E, Sarich T, Topol EJ. Effect of a home-based wearable continuous ECG monitoring patch on detection of undiagnosed atrial fibrillation: the mSToPS randomized clinical trial. JAMA. 2018; 320:146–155.

Figure 1.
Hierarchy of artificial intelligence-related terms.
jkma-61-765f1.tif
Figure 2.
Brief schematic summary of the processes for evaluating a novel health technology used by the Health Insurance Review and Assessment Service (HIRA) and the National Evidence-based Healthcare Collaborating Agency (NECA).
jkma-61-765f2.tif
Table 1.
A checklist for robust clinical validation of the performance of a machine-learning algorithm
Characteristics of the dataset used for clinical validation
Is it representative of the target patients in real-world practice for which the algorithm will be used?
Was it obtained from other institutions than those that provided the data for algorithm development?
Was it derived from multiple institutions?
Was it captured with scanners different from those used to create the data
for algorithm development (e.g., a computed tomography scanner from a different vendor)?a)
Was it obtained using acquisition parameters different from those used to create the data for algorithm development (e.g., a different radiation
dose setting or image reconstruction method for computed tomography)?a)
Was it collected prospectively?

The more of these questions receive a “Yes” answer, the more generalizable the algorithm performance is. a)Applicable to imaging data.

TOOLS
Similar articles