Journal List > J Korean Acad Oral Health > v.44(1) > 1144481

Yang, Kim, and Jeong: Prediction of dental caries in 12-year-old children using machine-learning algorithms

Abstract

Objectives

The decayed-missing-filled (DMFT) index is a representative oral health indicator. Prediction of DMFT index is an important basis for the development of public oral health care projects and strategies for caries prevention. In this study, we used data from the 2015 Korean children's oral health survey to predict DMFT index and caries risk groups using statistical techniques and four different machine-learning algorithms.

Methods

DMFT prediction models were constructed using multiple linear regression and four different machine-learning algorithms: decision tree regressor, decision tree classifier (DTC), random forest regressor, and random forest classifier (RFC). Thereafter, their accuracies were compared.

Results

For the DMFT predictive model, the prediction accuracy of multiple linear regression and RFC were 15.24% and 43.27%, respectively. The accuracy of DTC prediction was 2.84 times that of multiple linear regression. The important feature of the machine-learning model, which predicts DMFT index and the caries risk group, was the number of teeth with sealants.

Conclusions

Using data from the 2015 Korean children's oral health survey, which is considered big data in the field of oral health survey in Korea, this study confirmed that machine-learning models are more useful than statistical models for predicting DMFT index and caries risk in 12-year-old children. Therefore, it is expected that the machine-learning model can be used to predict the DMFT score.

References

1. Petersen PE, Bourgeois D, Ogawa H, Estupinan-Day S, Ndiaye C. The global burden of oral diseases and risks to oral health. Bull World Health Organ. 2005; 83:661–669.
2. Fejerskov O, Nyvad B, Kidd EAM. Dental caries: what is it. In: Fejer-skov O, Nyvad B, Kidd EAM. Dental caries. 3th ed. West sussex: John Wiley & Sons, Ltd;2015. p. 7–10.
3. Ministry of Health and Welfare. 2015 Korean Children’s Oral Health Survey. Sejong: Ministry of Health & Welfare;2015.
4. Seul MS. Current status and future developments of machine learning artificial intelligence in law: focusing the cusp of machine learning in U.S. and discourses over legal profession and law school education. The Justice. 2016; 156:269–302.
5. National library of Korea. National library of Korea digital collection. Leading the fourth industrial revolution-artificial intelligence and deep learning [Internet]. [cited 2019 Jan 02]. Available from:. http://nlcollection.nl.go.kr/front/search/searchList.do?facet=&indent=&query=%EC%9D%B8%EA%B3%B5%EC%A7%80%EB%8A%A5%2C+%EB%8D%B0%EC%9D%B4%ED%84%B0&facetField=true&wt=json&searchPageType=main&searchKeyword=%EC%9D%B8%EA%B3%B5%EC%A7%80%EB%8A%A5%2C+%EB%8D%B0%EC%9D%B4%ED%84%B0&searchSelect=all&searchFacet=&solrStart=0&solrEnd=20&solrRows=20.
6. Lee JH, Kim DH, Jeong SN, Choi SH. Detection and diagnosis of dental caries using a deep learning-based convolutional neural network algorithm. J Dent. 2018; 77:106–111.
crossref
7. National Science and Technology Information Service. Artificial intelligence, dentistry [Internet]. [cited 2019 Dec 17]. Available from:. https://www.ntis.go.kr/ThSearchTotalList.do?sort=RANK%2FDESC&ntisYn=&searchWord=%EC%9D%B8%EA%B3%B5%EC%A7%80%EB%8A%A5%2C+%EC%B9%98%EA%B3%BC.
8. Kim SJ, Ahn HC. Application of random forests to corporate credit rating prediction. The Journal of Business and Economics. 2016; 32:187–211.
9. Han EJ. Screening test data analysis for cataract happening prediction model using random forest [master’s thesis]. Seoul: Yonsei Uni-versity;2005. [Korean].
10. Yoo JH, Hong SH, Park HG, Kim DM, Kim SJ, Park SJ. Utilization of elderly stroke disease prediction using machine learning method. Korean Society for Emotion and Sensibility 2017 Annual spring conference program. 2017. 42.
11. Won SH, Lee CG, Park JM. A study on the prediction of land price with machine learning technique. Journal of the Korean Association of Professional Geographers. 2017; 51:347–355.
12. Bae SW, Yu JS. Estimation of the apartment housing price using the machine learning methods: the case of gangnam-gu, seoul. Journal of the Korea Real Estate Analysts Association. 2017; 1:293–309.
crossref
13. Choi CH, Park KH, Park HK, Lee MJ, Kim JS, Kim HS. Development of heavy rain damage prediction function for public facility using machine. J Korean Soc Hazard Mitig. 2017; 17:443–450.
14. Lee HY, Noh SC. Chaper 8. Linear regression. In: Lee HY, Noh SC. Advanced statistical analysis. 2nd ed. Seoul: Moonwoosa;2013. p. 250–339.
15. Montenegro RD, Oliveira ALI, Cabral GG, Katz CRT, Rosenblatt A. A comparative study of machine learning techniques for caries prediction. 2008 20th IEEE International Conference on Tools with Artificial Intelligence. 2008; 2:477–481.
crossref
16. Tamaki Y, Nomura Y, Katsumura S, Okada A, Yamada H, Tsuge S, et al. Construction of a dental caries prediction model by data mining. J Oral Sci. 2009; 51:61–68.
crossref
17. Gansky SA. Dental data mining: potential pitfalls and practical issues. Adv Dent Res. 2003; 17:109–114.
crossref
18. Ito A, Hayashi M, Hamasaki T, Ebisu S. Risk assessment of dental caries by using Classification and Regression Trees. J Dent. 2011; 39:457–463.
crossref
19. Churpek MM, Yuen TC, Winslow C, Meltzer DO, Kattan MW, Edel-son DP. Multicenter comparison of machine learning methods and conventional regression for predicting clinical deterioration on the wards. Crit Care Med. 2016; 44:368–374.
crossref
20. Chung SY, Cho JW, Jung YS, Kim HY, Kim JY, Choi YH, et al. Association between unmet needs for dental treatment and the DMFT index among Korean adults. J Korean Acad Oral Health. 2017; 41:267–273.
crossref
21. Shin HE, Kim HJ, Cho MJ, Choi YH, Song KB. Relationship between cancer and oral health in Korean adults determined using data from the 6th (2013-2014) Korea National Health and Nutrition Examination Survey. J Korean Acad Oral Health. 2017; 41:16–21.
22. Nyanye. [Internet]. [cited 2019 Nov 26]. Available from:. https://nyanye.com/machine-learning/2017/01/18/Decision-tree/.
23. Ahn SH, You HY, Kim MJ, Han DH, Kim JB, Jeong SH. Caries preventive effect of permanent teeth using pit and fissure sealant program and community water fluoridation program. J Korean Acad Oral Health. 2012; 36:289–296.
crossref
24. Oulis CJ, Berdouses ED, Mamai-Homata E, Polychronopoulou A. Prevalence of sealants in relation to dental caries on the permanent molars of 12 and 15-year-old Greek adolescents. A national pathfinder survey. BMC Public Health. 2011; 11:100.
crossref
25. van Loveren C, Lingstöm P. Chaper 8. Diet and dental caries. Fejerskov O, Nyvad B, Kidd E, editors. Dental Caries: the Disease and Its Clinical Management. 3rd ed.Oxford: Wiley Blackwell;2015. p. 133–154.
26. Hausen H, Baelum V. Chaper 23. How accurately can we assess the risk for developing caries lesions? Fejerskov O, Nyvad B, Kidd E, editors. Dental Caries: the Disease and Its Clinical Management. 3rd ed.Oxford: Wiley Blackwell;2015. p. 423–438.
27. van der Ploeg T, Austin PC, Steyerberg EW. Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints. BMC Med Res Methodol. 2014; 14:137.
crossref
28. Breiman L. Random Forests. Machine Learning. 2001; 45:5–32.

Table 1.
Distribution of study subjects by characteristics
Characteristics Classification N %
Gender Men 11,942 50.4
Women 11,760 49.6
Region City 18,496 78.0
Rural area 5,206 22.0
Number of pit and fissure sealant 0 10,345 43.6
1 2,546 10.7
2 2,937 12.4
3 2,138 9.0
4 3,285 13.9
5 636 2.7
6 573 2.4
7 306 1.3
8 347 1.5
9 114 0.5
10 109 0.5
11 100 0.4
12 98 0.4
13 40 0.2
14 55 0.2
15 38 0.2
16 35 0.1
Perceived oral health status Very good 1,349 5.7
Good 9,034 38.1
Fair 10,932 46.1
Poor 2,231 9.4
Very poor 156 0.7
Dental treatment demand for the past one year Yes 15,267 64.4
No 8,435 35.6
Experience of toothache for the past one year Yes 5,046 21.3
No 18,656 78.7
Frequency of snack intake per day No intake 2,780 11.7
Once 7,852 33.1
2 times 7,797 32.9
3 times 3,692 15.6
4 and over 1,581 6.7
Number of oral hygiene auxiliaries in use 0 14,428 60.9
1 6,787 28.6
2 2,002 8.4
3 418 1.8
4 59 0.2
5 8 0.0
Total 23,702 100.0

Data source from 2015 Korean Chidren’s Oral Health Survey.

Table 3.
Multiple linear regression model for prediction of DMFT in 12-year-olds
Model Unstandardized coefficient
Standardized coefficient
t Sig.
B Std. Error Beta
(Constant) .302 .075 4.006 .000
Gender .519 .032 .098 16.030 .000
Region .220 .039 .034 5.632 .000
Number of pit and fissure sealant ―.235 .006 ―.224 ―36.394 .000
Perceived oral health status .407 .022 .117 18.515 .000
Dental treatment demand for the past one year .966 .034 .175 28.128 .000
Experience of toothache for the past one year .350 .041 .054 8.550 .000
Frequency of snack intake per day .035 .015 .014 2.329 .020
Number of oral hygiene auxiliaries using .074 .022 .021 3.388 .001

R Square: 0.119, Dependent variable: DMFT, Variables Entered with Enter method. Gender: Men=0, Women=1 / Regine: City=0, Rural area=1 / Number of pit and fissure sealant: 0-16 / Perceived oral health status: Very good-Very poor=1-5 / Dental treatment demand for the past one year: Yes=1, No=0 / Experience of toothache for the past one year: Yes=1, No=0 / Frequency of snack intake per day: No intake=1, once=2, 2 times=3, 3 times=4, 4 and over=5 / Number of oral hygiene auxiliaries using: 0-5.

Table 4.
Accuracy (%) of the predicted DMFT in each machine learning algorithm
DMFT Frequency of DMFT Predicted DMFT MLR DTR DTC RFR RFC
0 10,140 0 64.9 87.0 48.2 86.8 54.7
1 2,932 1 14.9 21.0 20.9 19.3 57.6
2 2,773 2 12.7 16.6 17.0 15.3 51.7
3 1,764 3 9.3 12.6 5.1 10.8 45.7
4 2,963 4 20.8 27.5 13.0 23.3 37.1
5 866 5 0 19.8 4.6 7.9 50.2
6 798 6 30.4 10.2 12.7 44.7
7 381 7 32.2 12.7 3.8 43.7
8 384 8 52.8 3.8 44.1
9 185 9 68.4 4.3 71.4
10 135 10 78.6 11.1 72.2
11 98 11 100.0 0 56.3
12 94 12 66.7 0 77.8
13 66 13 100.0 50.0 35.3
14 54 14 100.0 66.7
15 25 15 100.0 44.4
16 34 16 100.0 100.0
17 4
18 1
19 4
20 1
Total 23,702

MLR, Multiple linear regression; DTR, Decision tree regressor; DTC, Decision tree classifier; RFR, Random forest regressor; RFC, Random forest classifier.

Table 5.
The feature importance of each machine learning algorithm
Decision tree regressor Decision tree classifier Random forest regressor Random forest classifier
Gender 0.0480 0.0455 0.0501 0.0275
Region 0.0492 0.0902 0.0603 0.0620
Number of pit and fissure sealant 0.3460 0.2298 0.3132 0.3438
Perceived oral health status 0.1109 0.1232 0.1089 0.1342
Dental treatment demand for the past one year 0.1313 0.0372 0.1117 0.0447
Experience of toothache for the past one year 0.0441 0.0635 0.0598 0.0405
Frequency of snack intake per day 0.1527 0.2170 0.1642 0.1868
Number of oral hygiene auxiliaries using 0.1178 0.1936 0.1318 0.1604
TOOLS
Similar articles