This article has been
cited by other articles in ScienceCentral.
Abstract
Objective
To develop and explore the usefulness of an artificial intelligence system for the prediction of the need for dental extractions during orthodontic treatments based on gender, model variables, and cephalometric records.
Methods
The gender, model variables, and radiographic records of 214 patients were obtained from an anonymized data bank containing 314 cases treated by two experienced orthodontists. The data were processed using an automated machine learning software (Auto-WEKA) and used to predict the need for extractions.
Results
By generating and comparing several prediction models, an accuracy of 93.9% was achieved for determining whether extraction is required or not based on the model and radiographic data. When only model variables were used, an accuracy of 87.4% was attained, whereas a 72.7% accuracy was achieved if only cephalometric information was used.
Conclusions
The use of an automated machine learning system allows the generation of orthodontic extraction prediction models. The accuracy of the optimal extraction prediction models increases with the combination of model and cephalometric data for the analytical process.
Keywords: Extraction vs. non-extraction, Computer algorithm, Decision tree, Orthodontic Index
INTRODUCTION
Malocclusion is a dentofacial anomaly that affects occlusal function, esthetics, and quality of life,
1 and orthodontics is the discipline that encompasses the evaluation, prevention, and treatment of malocclusions. Several methods for identifying the possible causes of malocclusions and treatment approaches have been developed. In recent years, computational approaches, using software systems, have been used in medicine and dentistry to facilitate more efficient diagnostic strategies and better therapeutic results guided by prognostic predictions.
2 Computational approaches have also been used to quantify the subjective impressions of the expert professional, incorporating the expert’s clinical perspective into the systems and making that available to the less experienced doctors.
3,4 Among these computational techniques, the ones that have received special attention are those from the field of Artificial Intelligence (AI), specifically, machine learning (ML), whereby a computer is
trained to generate a customized model based on a given dataset, which is used for predictions.
2 Widely used examples of complex ML algorithms are neural networks and deep neural networks.
As with several advanced computational techniques, applying ML effectively is an intellectually and technically challenging enterprise. ML requires an expert to organize, analyze, and optimize the data and the models and prevent overfitting.
5 Overfitting is a phenomenon by which excessive iterative learning can increase the goodness-of-fit of the model for the training dataset,
4 with a decrease in the accuracy of the model when applied to the test set or an external database. It is classically handled by splitting the sample into training and test parts, cross-validation, or more complex but more reliable methods as nested cross-validation, consisting of an inner circle of cross-validation nested within an outer circle of cross-validation. The inner circle is equivalent to the validation set and is used for model selection and optimization, while the outer circle is equivalent to the test set and is used for error estimation. Using nested cross-validation has been accepted as a viable approach to optimally unbiased performance evaluation on the training set while reducing overfitting.
6,7
To address the inherent complexity of ML algorithms, Automated ML (AutoML) systems have been developed. AutoML differs from ML systems in that they aim to automatically select, compose, and optimize various ML algorithms for optimal performance for a specified outcome variable. The availability of such systems has facilitated access to AI methods and technology by non-experts.
5,8,9 Several AutoML systems are available, including IBM AutoWatson (IBM Corp., Armonk, NY, USA) and Auto-WEKA (
https://github.com/automl/autoweka.).
9 Such systems differ mostly in their optimization techniques or user interfaces.
10 The use of AutoML systems within health care can improve health outcomes, reduce costs, and advance clinical research.
11 One of the main benefits of using AutoML systems over conventional systems is their ease of use. This allows non-data science professionals, such as most orthodontic clinicians, to create, test, and run AI systems in their practices without necessarily requiring a highly trained expert to assist them. However, more work needs to be done for the widespread adoption of this technology by healthcare professionals.
11
The applications of AI in dentistry are rapidly expanding. AI techniques and methods have been used to identify caries in radiographs
12 and predict periodontal
13 and endodontic treatment outcomes,
14 among others. In orthodontics, AI systems have been developed for automatic cephalometric tracing, automated diagnosis,
15 growth prediction,
16 treatment outcome prediction,
17 and cervical maturation determination.
18 To date, 7 papers have reported the development of ML prediction models for orthodontic extractions. Six of these papers reported accuracies of 80%,
3 87.5%,
19 90.5%,
20 93%,
4 94.6%,
21 and 81%,
22 respectively. The seventh article reported several accuracies, according to the model and test, ranging from 65 to 98%.
23 The usefulness of AutoML systems has not been assessed for the generation of models for the prediction of the need for orthodontic extractions.
This study aimed to generate prediction models for the need for dental extractions for orthodontic treatment based on gender, model features, and cephalometric records using an AutoML.
MATERIALS AND METHODS
The sex and clinical data of the patients and cephalometric data on comprehensive orthodontic treatments at an orthodontic clinic were obtained from an anonymized data bank of orthodontically treated cases between January 2018 and September 2019 from a single clinical practice run by two orthodontists in Santiago, Chile. These two experienced orthodontic practitioners, with more than 20 and 40 years of exclusive dedication to orthodontics, respectively (worked together for 18 years), performed the diagnosis, treatment planning, and treatment of all the individuals included in this sample. The data were anonymized and processed in a spreadsheet before the conception of this study as follows: once each patient had their orthodontic appliances removed, the orthodontist in charge entered the initial clinical information, including the model and radiographic data and whether or not extractions were performed in the said case into a spreadsheet, avoiding any data that would make the patient identifiable. Approval for the use of the data from an anonymized databank was obtained from the Research Committee of the Faculty of Odontology at Universidad de los Andes (CPI ODO 26).
The inclusion criteria included receiving consecutive treatments encompassing comprehensive orthodontic treatment in permanent dentition, using either buccal or lingual orthodontic fixed appliances. Patients who had incomplete records or received orthognathic surgical treatment, first-phase treatment, had one or more teeth other than the third molars absent at baseline, or presented with congenital malformations were excluded from the study.
The features used for the development of the optimal predictive models included sex, model variables, and cephalometric data and are shown in
Table 1. The data obtained from the anonymized data bank included seven variables considered relevant for predicting the need for extractions. The cephalometric data included in the anonymized database were obtained from cephalometric tracings by an experienced operator and reviewed by one of the two experienced orthodontists using Dolphin
® Imaging version 11.95 (Dolphin Imaging, Chatsworth, CA, USA). The only dependent variable included was “Extraction” (NO/YES), as described in
Table 1.
Three prediction settings were developed: the first incorporated sex and the model and cephalometric data, while the second and third incorporated only the model and cephalometric data, respectively. Each setting was entered into Auto-WEKA as a .csv (comma-separated values) file to optimize accuracy to identify the optimal model for each setting. The memory limit was set to 2 GB, and the time limits set for Auto-WEKA were 5, 15, 30, and 60 minutes and an overnight limit for which the system ran for at least 18 hours. After these settings were programmed, the “Run” button was clicked and the AutoML system was initiated. Auto-WEKA first performs an automated attribute (variable) selection process, involving 2 algorithms with up to 4 hyperparameters that are optimized to obtain the best variables and prevent redundant and/or irrelevant variables.
9 Auto-WEKA approaches the selected variables of the sample through 37 learning algorithms, each one being evaluated with its respective hyperparameters, which add up to 160.
9 Each hyperparameter can have multiple values, and the number of iterations may reach several for simulations that may run for several hours. To evaluate the performance, Auto-WEKA performs a 10-fold nested cross-validation system by default. This consists of a process that divides a sample automatically into 10 parts; 9 of them are used for the training/validation set and the 10th is used for a test set. The training/validation set was then subdivided into 10, with 9 parts used as the training set and the remaining one used as the validation set. Each training/validation set iterated 10 times, to make every set be once the validation set. Also, the division training/validation vs. test set iterated 10 times, to make every partition to be a test set. This method has been described as a viable way of obtaining an optimally unbiased performance while reducing overfitting.
6,7
The model with the best accuracy (number of cases correctly classified by the model related to the final decision of the doctor) for each setting was considered optimal. We also recorded the following metrics for each model: algorithm used, sensitivity, false-positive rate (Type I error or 1-Specificity), precision (positive predictive value), F-score (harmonic mean of the precision and sensitivity), Matthew’s correlation coefficient (correlation between the predicted class and reality), area under the receiver operating characteristic (ROC) curve, area under the curve Precision-Recall curve, and Kappa value.
RESULTS
Out of the 314 orthodontic treatments available in the analyzed database, 214 met the inclusion criteria; 44% of these were received by females. Extractions were performed for 38% of the cases. Further details on the sample are presented in
Tables 2 and
3.
Five different extraction prediction models were generated per setting (one for each time limit set). The accuracies obtained for the 5-, 15-, 30-, and 60-minute and overnight models were 80.37%, 86.45%, 80.37%, 80.37%, and 93.93%, respectively. The last and best result was facilitated by feature selection, were the AutoML chose the best variables that predicted the outcomes, which in this case were: maxillary arch discrepancy, mandibular arch discrepancy, molar class-modified, Rickett’s maxillary depth, Rickett’s facial axis, cephalometric molar relationship, and upper incisor protrusion. Based on these 7 variables, the system automatically created, using a multilayer perceptron algorithm, a model that was later optimized and tested following the nested cross-validation method.
For the second setting, the accuracies were 87.38%, 81.78%, 81.78%, 79.91%, and 84.11% for the respective time limits; the best accuracy for this model was achieved for the 5-minute time limit. Using a feature selection algorithm, the following variables were used for a logistic model tree algorithm: maxillary arch discrepancy, mandibular arch discrepancy, and molar class-modified.
Finally, the third model had the following accuracies for the respective time limits: 71.96%, 70.56%, 70.09%, 70.56%, and 70.56%. The 5-minute time limit was associated with the highest accuracy, via a Sequential Minimal Optimization algorithm applied after choosing the best features for predicting the outcome. These features included Rickett’s maxillary depth, Rickett’s facial axis, cephalometric molar relationship, upper incisor protrusion, and wits appraisal.
Table 4 shows the other outcomes for each of these settings.
DISCUSSION
This study is the first to explore the performance of an AutoML system for the generation of predictive models for dental extractions for orthodontic treatments using sample sizes comparable to those used for traditional ML systems in the past.
3,4,19-22 Our inclusion and exclusion criteria were similar to those used in previous studies on traditional ML methods,
3,4,19-22 which purposely excluded orthognathic patients and treatments for the absence of at least one tooth at baseline. The AutoML technology allowed us to achieve accuracies comparable to those reported in the literature using traditional ML techniques, albeit with a simplified methodology requiring minimal ML expertise.
This study does not advocate for automated decision-making for the extraction or non-extraction of teeth. We consider these AI tools as additional resources for the practitioner and can even be used after an initial judgment, with considerations of some characteristics inherent to these systems. First, ML approaches are relatively black boxes, and the contribution of each feature may be difficult to assess;
24 in our case, we determined the variables used, but not the contribution of each of them, despite the significant research advances on making ML models more interpretable.
25 The overall decision will take into account other factors that the tool may not have access to, such as age, previous diseases and dental interventions, presence of root resorption, functional factors, growth potential, patient preferences, as well as clinician-related diagnostic and therapeutic orientations, among others. Hence, like in any intervention involving the wellbeing of humans, the doctor’s critical judgment should remain at the center of the process. An AI-based system should serve as a complementary piece of information.
For our sample, the prevalence of cases of extractions was 38%, which is expected for an orthodontic clinic and is consistent with those reported in the literature, ranging from 50% in England for patients between 18 and 24 years of age
26 to 45.8% in Brazil,
27 40% in China,
3 and 25% in the United States.
28 Although the non-extraction protocol was dominantly used for our sample, a 40:60 proportion (similar to the 38:62 of our study) is usually accepted for a balanced sample. The imbalance in our sample may be considered minimal.
29 The final composition of the sample was determined by the inclusion and exclusion criteria applied to the anonymized database. It was considered that increasing the extraction group could generate a biased sample. Considering the similarities between our sample and that of other studies previously cited on dental extractions and ML,
3,4 we considered this sample suitable for the development of optimal prediction models.
The prediction of the need for extractions as a YES/NO dichotomous variable based on the combination of the model and cephalometric data showed an accuracy of 93.9% and an F1-value of 0.939 using a multilayer perceptron. This result is consistent with those published by other authors using similar methods but with conventional ML instead of Auto-ML. The reported accuracies ranged from 80 to 94.6%.
3,4,19-22 The study by Li et al.
21 reported a sensitivity of 94.6%, specificity of 93.8%, and an area under the ROC curve of 0.982, while the current study achieved 93.9%, 92%, and 0.915, respectively.
Considering the other two settings, which only used model or cephalometric data, the performance was 15.4% better when using only model information than when using only cephalometric information. This suggests that model variables may be more relevant for prescribing extractions than strictly cephalometric data. This may be explained by the fact that tooth extractions are highly affected by maxillary and mandibular arch discrepancies, variables that were included during the feature selection process for both settings 1 and 2. Nonetheless, the best performance was obtained when using both model and cephalometric data, with an accuracy of 6.5% greater than that achieved with the use of only model information and 21.9% greater than that with the use of only cephalometric data. All the variables automatically selected for setting 2 were also included for setting 1. In the same way, all variables chosen for setting 3, excluding one (Wits Appraisal), were also selected by the AutoML system for setting 1, showing high consistency in this regard.
Despite being included in this study, gender was never considered relevant during the feature (variable) selection process by Auto-WEKA. Some variables, such as overjet (OJ) and overbite (OB), were included as both model and cephalometric variables, creating an overlap between these two versions that could traditionally lead to overexpression of this feature in the models. Nevertheless, since the system automatically selects the best features and omits the redundant and/or irrelevant variables, the study considered feeding the Auto-AI system with all the variables available in the database and allow the system to automatically determine which ones to include in the models. For OJ and OB, no final algorithm considered the model or cephalometric data for these variables.
There is no minimum requirement for the sample size required to perform ML and this study used a relatively large database in the context of the published literature in the topic, it is accepted that algorithms reach their greatest potential with greater data sets. From that perspective, the sample for this study was small, as with all other studies published to date in the field. Our sample was obtained from an anonymized databank, and it was not possible to perform a reliability analysis, which is a common issue in the field
3,4,19-22 and is more evident in our study as two practitioners generated the data. Nevertheless, both doctors had worked together for almost two decades and systematically discussed the diagnosis and treatment plans of complex cases (i.e. any borderline extraction or compensatory orthodontic case). We believe that this favors the consistency of the treatments plans of both doctors. The criterion for this process is based on cephalometric skeletal, dental, and soft tissue readings, as well as clinical aspects that lie within a normal range. Once a diagnosis is reached, treatment planning is performed to modify abnormal occlusal characteristics without negatively affecting the soft tissues of the patient, hence, defining the need for extraction or non-extraction treatment protocols. Both clinicians plan their treatments aiming at optimal occlusal and facial results, using mainly active self-ligating brackets and contemporary orthodontic techniques, including temporary anchorage devices whenever deemed necessary. Therefore, the decision to extract or not to extract depended on the clinical presentation of the dentition combined with skeletal, dental, and soft tissue variables. Whenever possible, the non-extraction approach was preferred, as long as satisfactory occlusal and facial balance could be achieved.
The model generated does not represent the analysis of a particular doctor, but it is a third and singular algorithm, which could not represent the opinion of anyone despite being based on data provided by two orthodontists. In this line, future research should include treatments by several orthodontists, which would lead to the generation of a model with outcomes representing the unbiased recommendations of the professionals included.
Another potential limitation of this study is the degree of overfitting that may have occurred in the models. While the nested cross-validation system greatly reduces overfitting, there can always be a remnant, as with any ML design. This overfitting could explain why some models trained for a few minutes had better performance than those developed for several hours.
Within the decision-making process in Orthodontics, the prescription of extractions is of special relevance, given its irreversible nature as well as the fact that it severely conditions both the occlusal and soft tissue response to treatment. In addition, the indication of extractions is based on several factors, including the clinical, cephalometric, and socio-cultural.
30 This multi-factorial nature can make clinical decision-making particularly difficult.
The results of our work suggest a great potential for this readily available AutoML system in the field of orthodontics. Through the development of highly reliable predictive models
21 (necessarily including all the other variables previously named), AutoML systems can, through a simple methodology and minimal ML expertise, assist clinicians in challenging clinical decision-making scenarios such as tooth extractions.
CONCLUSION
Three different models for the prediction of the orthodontic need of dental extractions were generated and tested using an AutoML method, and they achieved accuracies of up to 93.9% for predicting the need for tooth extractions, similar to those obtained by more complex methods.
Prediction models for the need for dental extractions achieve their best performance when model and cephalometric data are combined, although model data seem more relevant.
The use of AutoML systems simplifies the process of model generation, making it less operator-dependent and allowing the generation of several models for accurate predictions.