Abstract
Background
Little research based on the artificial neural network (ANN) is done on preterm birth (spontaneous preterm labor and birth) and its major determinants. This study uses an ANN for analyzing preterm birth and its major determinants.
Methods
Data came from Anam Hospital in Seoul, Korea, with 596 obstetric patients during March 27, 2014 - August 21, 2018. Six machine learning methods were applied and compared for the prediction of preterm birth. Variable importance, the effect of a variable on model performance, was used for identifying major determinants of preterm birth. Analysis was done in December, 2018.
Results
The accuracy of the ANN (0.9115) was similar with those of logistic regression and the random forest (0.9180 and 0.8918, respectively). Based on variable importance from the ANN, major determinants of preterm birth are body mass index (0.0164), hypertension (0.0131) and diabetes mellitus (0.0099) as well as prior cone biopsy (0.0099), prior placenta previa (0.0099), parity (0.0033), cervical length (0.0001), age (0.0001), prior preterm birth (0.0001) and myomas & adenomyosis (0.0001).
Graphical Abstract
Preterm birth, i.e., birth between 20 and 37 weeks of gestation, is the leading cause of disease burden for newborns and children in the world and Korea.1234 Every year 15 million births are preterm, and preterm birth is the leading cause of neonatal and childhood mortality in the world, responsible for 965,000 neonatal deaths and additional 125,000 deaths among those aged one to five years. Indeed, it is estimated that 75% of this mortality can be prevented with cost-effective interventions.12 This global pattern is also found in the United States and Korea. Preterm birth affected one of every 10 newborns in the US during 2003–2012, i.e., 5,042,982 (12.2%) of 41,206,315 births.3 Likewise, the proportion of preterm birth registered a rapid growth from 4.3% to 6.0% in Korea during 2001–2010.4
The causes of preterm birth are still unclear in general but the following maternal conditions are reported to be risk factors for spontaneous preterm labor and birth as a type of preterm birth (which is referred as “preterm birth” hereafter for notational convenience): 1) socioeconomic determinants including low education, low income, heavy workload; 2) health-related factors such as body mass index (BMI), diabetes mellitus, hypertensive disorder; and 3) obstetric determinants including infection, parity, vaginal bleeding during pregnancy, the history of abortion, cesarean section, placenta abruption, placenta previa and/or preterm birth.5678 This line of research suffers from considering only a small set of factors for preterm birth on the list above and employing logistic regression with an unrealistic assumption of ceteris paribus, i.e., “all the other variables staying constant”. Using various machine learning methods and deriving variable importance, the effect of a variable on model performance, might be a good alternative to logistic regression.
A recent survey shows the increasing popularity of artificial neural network (ANN) and a wide variety of its applications across different tasks (e.g., classification vs. prediction vs. pattern recognition) and different sectors (e.g., business vs. finance vs. medicine).9 The popularity and explosion of the ANN application can be attributed to the rapid expansion of computing power in the past decades as well as to the fact that the ANN is relatively free from assumptions on models and data, e.g., ceteris paribus (“all the other variables staying constant”), data distribution and relationships among variables.910 For instance, the ANN has been known for its performance comparable or superior to those of traditional methods such as logistic regression and the random forest regarding the prediction of preterm birth111213 and non-melanoma skin cancer.14 In this context, the purpose of this study is two-fold, 1) analyzing preterm birth and its determinants based on general hospital data with various variables in Korea using ANN different from conventional analyses and 2) comparing the ANN to other machine learning methods in this regard. This study makes a rare attempt to compare the ANN to other popular machine learning methods for the analysis of preterm birth.
Data came from Anam Hospital in Seoul, Korea, with 596 obstetric patients during March 27, 2014 - August 21, 2018. The class label (or dependent variable) was spontaneous preterm labor and birth (or preterm birth, i.e., birth between 20 and 37 weeks of gestation, coded as “no” vs. “yes”). Here, labor was defined as regular uterine contraction with cervical change. The cases of indicated preterm birth were excluded. For this purpose, vaginal deliveries with the induction of labor were excluded, and cesarean deliveries were included in cases in which women suffered from labor. And the following attributes (or independent variables) were included in this study: 1) demographic factor, i.e., age; 2) health-related determinants such as BMI, drinker (no, yes), smoker (no, yes), diabetes mellitus (no, yes), hypertensive disorder (no, yes); and 3) obstetric variables, i.e., cervical length measured between 18 and 24 weeks of gestation (cm), in vitro fertilization (no, yes), myomas & adenomyosis (no, yes), parity, prior cone biopsy (no, yes), pelvic inflammatory disease history (no, yes), prior preterm birth (no, yes), prior placenta previa (no, yes).
Six machine learning methods were applied and compared for the prediction of preterm birth: ANN, logistic regression, decision tree, naïve Bayes, random forest and support vector machine. A decision tree consists of 1) internal nodes (each meaning a test on an attribute [or independent variable]), 2) branches (each denoting an outcome of the test) and 3) terminal nodes (each representing a class label [or dependent variable]). A naïve Bayesian classifier is a predictor based on Bayes' theorem. A random forest creates many training sets, trains many decision trees and makes a prediction with a majority vote (“bootstrap aggregation”). A support vector machine makes a prediction by maximizing a margin among hyperplanes separating data. The ANN of this study includes one input layer, two hidden layers and one output layer with 4,172 neurons as data units in the input layer, 15 in each hidden layer and 8 in the output layer. Here, 4,172, the number of neurons in the input layer, comes from the multiplication of 14 and 298, the numbers of attributes and observations in the training set, respectively. Neurons in the input or previous hidden layer combine with weights in the next hidden or output layer (feedforward algorithm). Then, the weights in the output layer and its previous hidden layers are adjusted based on how much they contributed to the loss of the ANN, i.e., a gap between the actual and predicted class labels (backpropagation algorithm). Initially the weights are set as small random numbers around 0 and the feedforward and backpropagation algorithms iterate until certain criteria meet for the accurate prediction of a class label.15
Data on 596 participants were divided into training and validation sets with a 50:50 ratio. The models were built (or trained) based on the training set with 298 observations then the models trained were validated based on the validation set with 298 observations. Accuracy, a ratio of correct predictions among 298 observations, was introduced as a criterion for validating the models trained. Variable importance from the ANN, an accuracy gap between a complete model and a model excluding a certain variable, was used for identifying major determinants of preterm birth. The greater “accuracy decrease” leads to the greater variable importance. This derivation is in a similar context but opposite direction of its random-forest counterpart: variable importance from the random forest is a mean-impurity gap between a complete model and a model excluding a certain variable (mean impurity, or the degree of data being mixed at a node on average, is disproportional to accuracy). The greater “mean-impurity increase” leads to the greater variable importance. Python 3.52 was employed for the analysis on December 2018.
Table 1 shows descriptive statistics for participants' preterm birth and attributes. Among 596 participants, 43 (7.21%), 55 (9.23%), 14 (2.35%) and 9 (1.51%) had preterm birth, hypertension, diabetes mellitus and prior cone biopsy, respectively. On average, indeed, the age, BMI, cervical length (cm) and parity of the participant were 32.68, 26.13, 4.04 and 0.44, respectively. Based on Table 2, the accuracy of the ANN (0.9115) was similar with those of logistic regression and the random forest (0.9180 and 0.8918, respectively). In addition, variable importance from the ANN was derived by subtracting, from the accuracy of the model with all variables (the ANN Full) [0.9115], the measure of the model excluding a certain variable [e.g., 0.8984 for the ANN excluding hypertension].
According to the variable importance from the ANN (Fig. 1), major determinants of preterm birth are BMI (0.0164), hypertension (0.0131) and diabetes mellitus (0.0099) as well as prior cone biopsy (0.0099), prior placenta previa (0.0099), parity (0.0033), cervical length (0.0001), age (0.0001), prior preterm birth (0.0001) and myomas & adenomyosis (0.0001). The results of logistic regression (Supplementary Table 1) provide useful information about the sign and magnitude for the effect of the major determinant on preterm birth. For example, the odds of preterm birth is higher by 48% for those with hypertension than those without it, whereas, the odds is 79 times as high for those with diabetes mellitus as those without it. Likewise, the odds of preterm birth is higher by 85% for those with prior cone biopsy than those without it, while, the odds will almost quadruple if the cervical length decreases by 1 centimeter. It needs to be noted, however, that the findings of logistic regression are based on an unrealistic assumption of ceteris paribus, i.e., “all the other variables staying constant.” For this reason, the results of logistic regression need to be considered as just supplementary information to the variable importance from the ANN.
Indeed, based on the variable importance from the random forest (Fig. 2), main factors for preterm birth are BMI (0.3172), cervical length (0.2674) and age (0.2590) as well as prior preterm birth (0.0495), parity (0.0449), hypertension (0.0205), myomas & adenomyosis (0.0186), diabetes mellitus (0.0148), prior cone biopsy (0.0074) and in vitro fertilization (0.0028). The results of the ANN and the random forest both highlight the significance of BMI, hypertension, diabetes mellitus, prior cone biopsy, parity, cervical length, age, prior preterm birth and myomas & adenomyosis. But the ANN outcomes put more focus on hypertension, diabetes mellitus and prior cone biopsy whereas their random-forest counterparts place more emphasis on cervical length, age and prior preterm birth. Finally, Fig. 3 shows the receiver-operating-characteristic (ROC) curves of the ANN and the random forest, the plots of the true positive rate (or sensitivity) vs. the false positive rate (or 1-specificity). The area under the ROC curve (AUC) measures the power or usefulness of the model. Based on the measure, the ANN and the random forest might be useful models: their respective AUCs, i.e., 0.62 and 0.64, are comparable to those of the ANNs in similar studies with 19,910 participants in the Duke University Medical Center during January 1, 1988 and June 1, 1997, i.e., 0.64–0.68.111213
In summary, the accuracy of the ANN was similar with those of logistic regression and the random forest for the prediction of preterm birth. Based on variable importance from the ANN, major determinants of preterm birth are BMI, hypertension and diabetes mellitus as well as prior cone biopsy, prior placenta previa, parity, cervical length, age, prior preterm birth and myomas & adenomyosis. The following maternal risk factors for spontaneous preterm labor and birth as a type of preterm birth (which is referred as “preterm birth” in this study for notational convenience) have been reported: socioeconomic determinants including low education, low income, heavy workload; health-related factors such as BMI, diabetes mellitus, hypertensive disorder; and obstetric determinants including infection, parity, vaginal bleeding during pregnancy, the history of abortion, cesarean section, placenta abruption, placenta previa and/or preterm birth.
This study used an ANN for analyzing preterm birth and its determinants based on general hospital data in Korea. This study also covered a large set of demographic, health-related and obstetric determinants of preterm birth, i.e., fourteen. The ANN results put more focus on indirect determinants of preterm birth (i.e., hypertension, diabetes mellitus, prior cone biopsy) whereas their random-forest counterparts place more emphasis on direct factors for preterm birth (i.e., cervical length, age, prior preterm birth). As explained in the section of Methods, the feedforward and backpropagation algorithms with constant learning (i.e., continued updates of weights) iterate in the ANN until certain criteria meet for the accurate prediction of a class label. This unique process of the ANN might lead to its distinctive outcomes from other machine learning methods including the random forest. This distinction of the ANN from the random forest in this study might suggest the shift of focus for clinical implications from direct determinants of preterm birth (i.e., cervical length, age, prior preterm birth) to their indirect counterparts (i.e., hypertension, diabetes mellitus, prior cone biopsy). For example, based on the ANN results of this study, preventive measures for hypertension and diabetes mellitus might need more attention for managing the health conditions of pregnant women and prospective newborns. Several studies report a strong association of preterm birth with BMI, hypertension and diabetes mellitus during pregnancy.161718192021 For instance, women with diabetes mellitus had a significantly higher rate of preterm birth than did those without the disease (odds ratio, 1.6; 95% confidence interval, 1.2–2.2).19 The risk of preterm birth also increased with the level of pregnancy glycemia in a dose-dependent pattern.20 This dose-dependent pattern was found in pregnancy-related hypertension as well: The risk of preterm birth was associated with an excessive rise in blood pressure between early and late periods of pregnancy.21 Different studies suggest different pathways among these variables and identifying a correct mechanism is beyond the scope of this study. However, this study confirms their overall results, suggesting that early interventions for hypertension and diabetes mellitus during or even before pregnancy might be vital for preventing preterm birth and protecting maternal health. More effort should be made for developing the effective prevention programs based on rigorous clinical trials and promoting these programs among all risk groups.
Another clinical implication from the ANN outcomes in this study might be the development and promotion of effective cervical-length screening based on the information of prior cone biopsy. Shortened cervical length is reported to be a strong predictor of preterm birth2223 and this report is consistent with the findings of the random forest in this study. The results of several studies also agree with those of the ANN in this study, indicating that shortened cervical length is associated with prior obstetric history (e.g., the scope/type of cervical conization, time from excision to pregnancy) and that cervical-length screening needs to reflect these conditions.24252627 The Society for Maternal-Fetal Medicine in the United States recommends 1) routine transvaginal cervical-length screening for women with singleton pregnancy and prior spontaneous preterm birth (GRADE 1A), 2) strict guidelines for practitioners who decide to implement universal cervical-length screening (GRADE 2B), and 3) specific training for sonographers and/or practitioners regarding the acquisition and interpretation of cervical imaging during pregnancy (GRADE 2B).23 Similar guidelines need to be developed based on the scope/type of prior conization and time from excision to pregnancy, and this preventive measure needs to be promoted among all risk groups for preterm birth.
This study had some limitations. Firstly, this study used a cross-sectional design because of limited data availability. Expanding data with a longitudinal design is expected to improve the accuracy of the ANN much more. Secondly, expanding this study to the following variables might add a great contribution to this line of research: socioeconomic factors such as education, income and workload; and other obstetric determinants including infection, periodontal disease, vaginal bleeding during gestation and an interval between pregnancies. Thirdly, this study did not consider detailed information on several variables including the types of diabetes mellitus, the size and shape of myomas & adenomyosis, the excision depth or volume in prior cone biopsy and the microbiologic information of prior pelvic inflammatory disease. Fourthly, this study did not consider possible mediating effects among variables. Fifthly, this study used data with a small size in a single center. Expanding this study to big data will be a good topic for future research. Sixthly, the AUC of the ANN in this study (0.62) was comparable to those of the ANNs in similar studies in the past (0.64–0.68)111213 but it needs to be noted that there exists a lot of room for further improvement. Accurate classification and prediction of preterm birth are considered to be a very challenging task given a great variety of potential factors but the continued absence of reliable data on the variable. In this context, the AUC of the ANN in this study might be a good starting point for further research, although the performance measure of this study might not be optimal as a diagnostic test yet and more effort should be made in this regard. Finally, further analysis of specific patients, e.g., symptomatic vs. asymptomatic, low vs. high risk, single vs. multiple gestation, might have offered more insight on this line of research with more detailed clinical implications.
In conclusion, for preventing preterm birth, preventive measures for hypertension and diabetes mellitus might be needed alongside the promotion of cervical-length screening with different guidelines across the scope/type of prior conization.
References
1. World Health Organization. News: preterm birth. Updated February 19, 2018. Accessed December 1, 2018. http://www.who.int/news-room/fact-sheets/detail/preterm-birth.
2. Harrison MS, Goldenberg RL. Global burden of prematurity. Semin Fetal Neonatal Med. 2016; 21(2):74–79.
3. Magro Malosso ER, Saccone G, Simonetti B, Squillante M, Berghella V. US trends in abortion and preterm birth. J Matern Fetal Neonatal Med. 2018; 31(18):2463–2467.
4. Lee NH. International trends and implications for preterm birth. Health Soc Welf Forum. 2013; (200):116–127.
5. Kim YJ, Lee BE, Park HS, Kang JG, Kim JO, Ha EH. Risk factors for preterm birth in Korea: a multicenter prospective study. Gynecol Obstet Invest. 2005; 60(4):206–212.
6. Di Renzo GC, Giardina I, Rosati A, Clerici G, Torricelli M, Petraglia F, et al. Maternal risk factors for preterm birth: a country-based population analysis. Eur J Obstet Gynecol Reprod Biol. 2011; 159(2):342–346.
7. Boghossian NS, Yeung E, Albert PS, Mendola P, Laughon SK, Hinkle SN, et al. Changes in diabetes status between pregnancies and impact on subsequent newborn outcomes. Am J Obstet Gynecol. 2014; 210(5):431.e1–431.e14.
8. Premkumar A, Henry DE, Moghadassi M, Nakagawa S, Norton ME. The interaction between maternal race/ethnicity and chronic hypertension on preterm birth. Am J Obstet Gynecol. 2016; 215(6):787.e1–787.e8.
9. Abiodun OI, Jantan A, Omolara AE, Dada KV, Mohamed NA, Arshad H. State-of-the-art in artificial neural network applications: a survey. Heliyon (Lond). 2018; 4(11):e00938.
10. Song X, Mitnitski A, Cox J, Rockwood K. Comparison of machine learning techniques with classical statistical models in predicting health outcomes. Stud Health Technol Inform. 2004; 107(Pt 1):736–740.
11. Goodwin LK, Maher S. Data mining for preterm birth prediction. In : Proceedings of the 2000 ACM Symposium on Applied Computing; March 19-21, 2000; Villa Olmo, Italy. New York, NY: Association for Computing Machinery;p. 46–51.
12. Goodwin LK, Iannacchione MA, Hammond WE, Crockett P, Maher S, Schlitz K. Data mining methods find demographic predictors of preterm birth. Nurs Res. 2001; 50(6):340–345.
13. Goodwin LK, Iannacchione MA. Data mining methods for improving birth outcomes prediction. Outcomes Manag. 2002; 6(2):80–85.
14. Roffman D, Hart G, Girardi M, Ko CJ, Deng J. Predicting non-melanoma skin cancer via a multi-parameterized artificial neural network. Sci Rep. 2018; 8(1):1701.
15. Han J, Micheline K. Data Mining: Concepts and Techniques. 2nd ed. San Francisco, CA: Elsevier;2006.
16. Parker MG, Ouyang F, Pearson C, Gillman MW, Belfort MB, Hong X, et al. Prepregnancy body mass index and risk of preterm birth: association heterogeneity by preterm subgroups. BMC Pregnancy Childbirth. 2014; 14(1):153.
17. Heude B, Thiébaugeorges O, Goua V, Forhan A, Kaminski M, Foliguet B, et al. Pre-pregnancy body mass index and weight gain during pregnancy: relations with gestational diabetes and hypertension, and birth outcomes. Matern Child Health J. 2012; 16(2):355–363.
18. Shin D, Song WO. Prepregnancy body mass index is an independent risk factor for gestational hypertension, gestational diabetes, preterm labor, and small- and large-for-gestational-age infants. J Matern Fetal Neonatal Med. 2015; 28(14):1679–1686.
19. Sibai BM, Caritis SN, Hauth JC, MacPherson C, VanDorsten JP, Klebanoff M, et al. Preterm delivery in women with pregestational diabetes mellitus or chronic hypertension relative to women with uncomplicated pregnancies. The National institute of Child health and Human Development Maternal-Fetal Medicine Units Network. Am J Obstet Gynecol. 2000; 183(6):1520–1524.
20. Hedderson MM, Ferrara A, Sacks DA. Gestational diabetes mellitus and lesser degrees of pregnancy hyperglycemia: association with increased risk of spontaneous preterm birth. Obstet Gynecol. 2003; 102(4):850–856.
21. Zhang J, Villar J, Sun W, Merialdi M, Abdel-Aleem H, Mathai M, et al. Blood pressure dynamics during pregnancy and spontaneous preterm birth. Am J Obstet Gynecol. 2007; 197(2):162.e1–162.e6.
22. O'Hara S, Zelesco M, Sun Z. Cervical length for predicting preterm birth and a comparison of ultrasonic measurement techniques. Australas J Ultrasound Med. 2013; 16(3):124–134.
23. McIntosh J, Feltovich H, Berghella V, Manuck T. Society for Maternal-Fetal Medicine (SMFM). The role of routine cervical length screening in selected high- and low-risk women for preterm birth prevention. Am J Obstet Gynecol. 2016; 215(3):B2–B7.
24. Berghella V, Pereira L, Gariepy A, Simonazzi G. Prior cone biopsy: prediction of preterm birth by cervical ultrasound. Am J Obstet Gynecol. 2004; 191(4):1393–1397.
25. Bevis KS, Biggio JR. Cervical conization and the risk of preterm delivery. Am J Obstet Gynecol. 2011; 205(1):19–27.