Loading [MathJax]/jax/output/HTML-CSS/fonts/TeX/fontdata.js

Journal List > J Korean Soc Med Inform > v.15(2) > 1035521

Lee and Park: Basic Concepts and Principles of Data Mining in Clinical Practice

Abstract

Recently, many hospitals have been adopting clinical data warehouses (CDW) as well as electronic medical records. These new hospital information systems are inevitably introducing very large amounts of clinical data that might be useful for further analysis. However, the electronic clinical data in the CDW are usually byproducts of clinical practice rather than the product of research. Therefore, they include inconsistent and sometimes erroneous information that might not have the specific context of the clinical situations. Data miners usually have various academic backgrounds such as electronics, informatics, statistics, biomedicine, and public health. If the complex situations surrounding the clinical data are not well understood, investigators performing data mining in clinical fields may have problems assessing the information they are confronted with. Here, we would like to introduce some basic concepts on the principles of data mining in clinical fields including legal and ethical considerations as well as technical concerns.

Figures and Tables

jksmi-15-175-g001
Figure 1
An example of descriptive summary of some selected features

Download Figure

jksmi-15-175-g002
Figure 2
Data quality and outlier summary

Download Figure

jksmi-15-175-g003
Figure 3
An example topology of artificial neural network model

Download Figure

jksmi-15-175-g004
Figure 4
An examples of decision tree model

Download Figure

jksmi-15-175-g005
Figure 5
An example of bayesian network model

Download Figure

Acknowledgement

The authors would like to acknowledge Professor Kyi Young Lee, Dept. of Biomedical Informatics, School of Medicine, Ajou University for his detailed review of the manuscript.

References

1. Cios KJ, William Moore G. Uniqueness of medical data mining. Artificial Intelligence in Medicine. 2002. 26(1-2):1–24.
crossref
2. Lavrac N, Keravnou E, Zupan B. Lavrac N, Keravnou E, Zupan B, editors. An overview. Intelligent data analysis in medicine and pharmacology. 1997. Boston: Kluwer;1–13.
3. Simon SR, Kaushal R, Cleary PD, Jenter CA, Volk LA, Orav EJ, et al. Physicians and electronic health records: a statewide survey. Archives of Internal Medicine. 2007. 167(5):507–512.
4. Menachemi N, Perkins RM, van Durme DJ, Brooks RG. Examining the adoption of electronic health records and personal digital assistants by family physicians in Florida. Inform Prim Care. 2006. 14(1):1–9.
crossref
5. Park RW, Shin SS, Choi YI, Ahn JO, Hwang SC. Computerized physician order entry and electronic medical record systems in Korean teaching and general hospitals: results of a 2004 survey. J Am Med Inform Assoc. 2005. 12(6):642–647.
crossref
6. Sittig F, Guappone K, Campbell E, Dykstra R, Ash J. A survey of USA acute care hospitals' computer-based provider order entry system infusion levels. Stud Health Technol Inform. 2007. 129(1):252.
7. DesRoches CM, Campbell EG, Rao SR, Donelan K, Ferris TG, Jha A, et al. Electronic health records in ambulatory care--a national survey of physicians. The New England Journal of Medicine. 2008. 359(1):50–60.
crossref
8. Dewitt JG, Hampton PM. Development of a data warehouse at an academic health system: knowing a place for the first time. Acad Med. 2005. 80(11):1019–1025.
crossref
9. Schubart JR, Einbinder JS. Evaluation of a data warehouse in an academic health sciences center. International Journal of Medical Informatics. 2000. 60(3):319–333.
crossref
10. Silver M, Sakata T, Su HC, Herman C, Dolins SB, O'Shea MJ. Case study: how to apply data mining techniques in a healthcare data warehouse. J Healthc Inf Manag. 2001. 15(2):155–164.
11. Zhang Q, Matsumura Y, Teratani T, Yoshimoto S, Mineno T, Nakagawa K, et al. The application of an institutional clinical data warehouse to the assessment of adverse drug reactions (ADRs). Evaluation of aminoglycoside and cephalosporin associated nephrotoxicity. Methods Inf Med. 2007. 46(5):516–522.
crossref
12. Kononenko I. Machine learning for medical diagnosis: history, state of the art and perspective. Artif Intell Med. 2001. 23(1):89–109.
crossref
13. Lavrac N. Selected techniques for data mining in medicine. Artif Intell Med. 1999. 16(1):3–23.
crossref
14. Kopelman LM. Minimal risk as an international ethical standard in research. The Journal of Medicine and Philosophy. 2004. 29(3):351–378.
crossref
15. Cios KJ. Medical data mining and knowledge discovery. IEEE Eng Med Biol Mag. 2000. 19(4):15–16.
16. Cios KJ, Teresinska A, Konieczna S, Potocka J, Sharma S. A knowledge discovery approach to diagnosing myocardial perfusion. IEEE Eng Med Biol Mag. 2000. 19(4):17–25.
crossref
17. Yuan YC. Multiple imputation for missing data: concepts and new development. In : Twenty-Fifth Annual SAS Users Group International Conference 2000;
18. Schafer JL, Graham JW. Missing data: our view of the state of the art. Psychol Methods. 2002. 7(2):147–177.
crossref
19. Harel O, Zhou XH. Multiple imputation: review of theory, implementation and software. Stat Med. 2007. 26(16):3057–3077.
crossref
20. Haykin S. Neural networks and learning machines. 2008. 3rd ed. New York: Prentice Hall.
21. Bishop CM. Pattern recognition and machine learning. 2005. 2nd ed. New York: Springer;291–358.
22. Rokach L, Maimon O. Data mining with decision trees: theroy and applications. 2008. Danvers, MA: World Scientific Publishing Company.
23. Heckerman DE. MSR-TR-94-09. Learning Bayesian networks: The combination of knowledge and statistical data. 1995. Redmond, WA: Microsoft Research.
24. Heckerman DE. Bayesian networks for data mining. Data Mining and Knowledge Discovery. 1997. 1:79–119.
25. Heckerman DE, Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R. Bayesian networks for knowledge discovery. Advances in knowledge discovery and data mining. 1996. Menlo Park, CA: The MIT Press;273–305.
26. Lee SM, Abbott P. Bayesian networks for knowledge discovery in large datasets: basics for nurse researchers. Journal of Biomedical Informatic. 2003. 36(4/5):389–399.
crossref
27. SPSS. Clementine 12.0 modeling nodes. 2007. Chicago: SPSS.
28. SPSS. Clementine manual-Basic. 2007. Seoul: SPSS.
29. Menard SW. Applied logistic regression analysis. 2001. 2nd ed. London: Sage Publications.
30. Lee SM, Abbott P, Johantgen M. Logistic regression and bayesian networks to study outcomes using large data sets. Nursing Research. 2005. 54(2):133–138.
crossref
31. Tu JV. Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. Journal of Clinical Epidemiology. 1996. 49:1225–1232.
crossref
32. Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang CH, Angelo M, et al. Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci U S A. 2001. 98(26):15149–15154.
crossref
33. Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics. 2000. 16:906–914.
crossref
34. Lauritzen SL, Spiegelhalter DJ. Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society Series B. 1988. 50(2):157–194.
crossref
35. Eisenstein EL, Alemi F. A comparison of three techniques for rapid model development: an application in patient risk-stratification. Proceedings/AMIA Annual Fall Symposium. 1996. 443–447.
36. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982. 143(1):29–36.
crossref
37. Hanley JA, McNeil BJ. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology. 1983. 148(3):839–843.
crossref
38. Rowland T, Ohno-Machado L, Ohrn A. Comparison of multiple prediction models for ambulation following spinal cord injury. Proceedings/AMIA Annual Symposium. 1998. 528–532.
39. Hosmer DW, Lemeshow S. Goodness of fit tests for the multiple logistic regression model. Communications in Statistics. 1980. A9(10):1043–1069.
crossref
40. Lemeshow S, Hosmer DW. A review of goodness of fit statistics for use in the development of logistic regression models. American Journal of Epidemiology. 1982. 115(1):92–106.
crossref
41. Blum RL. Displaying clinical data from a time-oriented database. Computers in Biology and Medicine. 1981. 11(4):197–210.
crossref
42. Elomaa T HN. An experimental comparison of inducing decision trees and decision lists in noisy domains. In : 4th European Working Session on Learning; Dec 4-6, 1989.
43. Lesmo L SL, Torasso P. Gupta MM SE, editor. Learning of fuzzy production rules for medical diagnoses. Approximate reasoning in decision analysis. 1982. Amsterdam: North-Holland;249–260.
44. Hojker S KI, Jauk A, Fidler V, Porenta M. Expert system's development in the management of thyroid diseases. 1988. Sep. In : European Congress for Nuclear Medicine; Milano. Milano:
45. Horn W. AI in medicine on its way from knowledge-intensive to data-intensive systems. Artificial Intelligence in Medicine. 2001. 23(1):5–12.
crossref
46. Quinlan R CP, Horn KA, Lazarus L. JR Q, editor. Inductive knowledge acquisition: a case study. Applications of expert systems. 1987. Boston: Addison-Wesley;137–156.
47. Zupan B, Dzeroski S. Acquiring background knowledge for machine learning using function decomposition: a case study in rheumatology. Artif Intell Med. 1998. 14(1-2):101–117.
crossref
48. Cohen ME, Hudson DL. Neural network models for biosignal analysis. Conf Proc IEEE Eng Med Biol Soc. 2006. 1:3537–3540.
crossref
49. Chun FK, Karakiewicz PI, Briganti A, Walz J, Kattan MW, Huland H, et al. A critical appraisal of logistic regression-based nomograms, artificial neural networks, classification and regression-tree models, look-up tables and risk-group stratification models for prostate cancer. BJU Int. 2007. 99(4):794–800.
crossref
50. Rodriguez Alonso A, Pertega Diaz S, Gonzalez Blanco A, Pita Fernandez S, Suarez Pascual G, Cuerpo Perez MA. The utility of artificial neural networks in the prediction of prostate cancer on transrectal biopsy. Actas Urol Esp. 2006. 30(1):18–24.
51. Stephan C, Cammann H, Jung K. Artificial neural networks: has the time come for their use in prostate cancer patients? Nat Clin Pract Urol. 2005. 2(6):262–263.
crossref
52. Gamito EJ, Crawford ED. Artificial neural networks for predictive modeling in prostate cancer. Curr Oncol Rep. 2004. 6(3):216–221.
crossref
53. Porter CR, Crawford ED. Combining artificial neural networks and transrectal ultrasound in the diagnosis of prostate cancer. Oncology (Williston Park). 2003. 17(10):1395–1399. discussion 1399, 1403-1396.
54. Schwarzer G, Schumacher M. Artificial neural networks for diagnosis and prognosis in prostate cancer. Semin Urol Oncol. 2002. 20(2):89–95.
crossref
55. Errejon A, Crawford ED, Dayhoff J, O'Donnell C, Tewari A, Finkelstein J, et al. Use of artificial neural networks in prostate cancer. Mol Urol. 2001. 5(4):153–158.
crossref
56. Murphy GP, Snow P, Simmons SJ, Tjoa BA, Rogers MK, Brandt J, et al. Use of artificial neural networks in evaluating prognostic factors determining the response to dendritic cells pulsed with PSMA peptides in prostate cancer patients. Prostate. 2000. 42(1):67–72.
crossref
57. Gamito EJ, Stone NN, Batuello JT, Crawford ED. Use of artificial neural networks in the clinical staging of prostate cancer: implications for prostate brachytherapy. Tech Urol. 2000. 6(2):60–63.
58. Snow PB, Smith DS, Catalona WJ. Artificial neural networks in the diagnosis and prognosis of prostate cancer: a pilot study. J Urol. 1994. 152(5 Pt 2):1923–1926.
crossref
59. Giles LC, Whitehead CH, Jeffers L, McErlean B, Thompson D, Crotty M. Falls in hospitalized patients: can nursing information systems data predict falls? Computers, Informatics, Nursing. 2006. 24(3):167–172.
60. Tiet Q, Ilgen MA, Byrnes HF, Moos RH. Suicide attempts among substance use disorder patients: an initial step toward a decision tree for suicide management. Alcoholism: Clinical and Experimental Research. 2006. 30(6):998–1005.
crossref
61. Modai I, Valevski A, Solomish A, Kurs R, Hines IL, Ritsner M, et al. Neural network detection of files of suicidal patients and suicidal profiles. Medical Informatics and the Internet in Medicine. 1999. 24(4):249–256.
crossref
62. Anthony D, Clark M, Dallender J. An optimization of the Waterlow score using regression and artificial neural networks. Clinical Rehabilitation. 2000. 14(1):102–109.
crossref
63. Brossette SE, Sprague AP, Hardin JM, Waites KB, Jones WT, Moser SA. Association rules and data mining in hospital infection control and public health surveillance. Journal of the American Medical Informatics Association. 1998. 5(4):373–381.
crossref
64. Rapeli CB, Botega NJ. Clinical profiles of serious suicide attempters consecutively admitted to a university-based hospital: a cluster analysis study. Revista Brasileira de Psiquiatria. 2005. 27(4):285–289.
crossref
TOOLS
Similar articles