Introduction
Over the past decades, crimes and pandemics have posted a real challenge for experts to establish biological profiles of human remains. Over 1,800 bodies from the Indian ocean Tsunami in 2004 remain unidentified even 7 months after the incidence [
1]. Apart from that, there are growing cases of murdered victims yet to be given justice. According to Thai ministry of justice, for the past 17 years (2002–2019), over 4,000 unnamed bodies have been reported [
2]. The European union and The US also face the same problem. The EU has more than 1,500 accumulated bodies [
3] while the US has 4,000 bodies reported per year [
4]. In all country, money and resources are spent storing and managing the bodies especially in Thailand where those bodies have to be kept up to 20 years [
2]. However, the major portion of those bodies are of the people who have been reported missing. In order to improve the situation, a more effective way of constructing one’s identity is necessary and this is where age plays a crucial role.
Bone has long been recognized as one of the five key elements in identification process [
5]. Its relationship with age and persistency makes it the most preserved part that remain in the crime scene. Various bone parts have been used by forensic anthropologists as an age indicator for example: pelvis, cranium, femur and clavicle [
6-
8]. Because of the characteristic of clavicles: low weight bearing, the longest time to maturate, easy to access, they are considered as the best candidate for accurate age estimation. The statement is correlate with Marera and Satyapal [
9] study which found that estimating age from clavicle yields the most accurate result. However, methods used nowadays mainly rely on observing epiphyseal plate fusion of clavicles which could only predict age of young adult and adolescent. In addition, it requires specialists to interpret the result which leads to the problem of subjectivity and time consumption [
10]. An alternative method which is widely used in bone age estimation other than epiphyseal plate union is bone mineral density (BMD). However, Botha et al. [
11] stated that BMD in blacks does not significantly correlate with age which disagrees with previous research. Other methods used in this field: cortical thickness and histomorphometry are both impractical in real life application due to their methodology inconsistency and sample variety. Apart from this, histomorphometry is destructive to the specimen and should be an alternative way to forensic human identification.
Imaging techniques used to estimate bone age are conventional radiograph, digital X-ray radiograph (DXR), dual-energy X-ray absorptiometry (DXA), computed tomography (CT), magnetic resonance imaging (MRI) [
12,
13]. DXA, CT, and MRI are considered hard - to – obtain and not cost efficient in our country. Consequently, digital radiography, which is easy to obtain and cost efficient, was chosen. Furthermore, DXR, compared to conventional radiograph, provides clearer and more accurate pictures. The concept of imaging and age estimation is based on the knowledge that bone density will deteriorate overtime due to bone modeling and remodeling mechanism. As we age, the remodeling ability diminishes, accompanied by changes in hormonal response and reduced production of calcitriol, results in increase of bone porosity [
14].
Artificial Intelligence is an ability of a computer or robots to do tasks that are usually done by human. Deep learning is a type of artificial intelligence that imitates the way humans gain certain type of knowledge. There are 2 main types of deep learning: Artificial Neural Network (ANN) and Convolutional Neural Network (CNN). ANN refers to a network that resembles human neural network. After receiving signal from data, neurons weigh their data which is passed on from layer to another. If the system detects any defect, it will change the weight of that layer. CNN refers to a system that learns mainly by pattern recognition and is widely used in images recognition – classification and object detection. It has been used for analyzing medical images in various field such as prediction of blood glucose level from electrocardiogram [
15], Not repeated in the main text. Please spell it out. evaluation of lung nodules on chest CT and classification of ear abnormality.
Thus, we aim to embrace the help of CNN (GoogLeNet; Google Inc., Mountain View, CA, USA) with digital radiograph to create a more convenient, refined, and non-subjective method to estimate age from medial clavicle in a Thai population, which has not yet been done. This would promote precise personal identification and facilitate the forensic team to return justice to the dead.
Go to :

Results
The total of 3,684 images of male samples, after excluding from 3,999 images that did not resemble the clavicular structure, were used for this model with an age range at death from 30 to 100 years. A training dataset was used for CNN models, and a validation dataset was used to avoid overfitting. The original dataset was randomly divided into 75% for training and 25% for validation.
Fig. 2 shows the confusion matrices from the training and validation dataset of models. The correct age prediction for each class was 96.67%, 59.17%, 62.5%, 93.33%, and 88.33% for 40–49, 50–59, 60–69, 70–79, and 80–89 classes, respectively. Because the age group of 30–39 and 90–100 have a small sample size, the confusion matrix that the auto-random test was auto-excluded those two groups for testing (
Fig. 2).
 | Fig. 2Confusion matrix of the predicted age range. 
|
Fig. 3 demonstrates training progress, accuracy, and loss after training of the final CNN model, the highest validation accuracy was 89.02% when applied hyperparameters of the age estimation model as follows: InitialLearnRate 0.0002, L2regularization 0.0002, MaxEpochs 50, MiniBatchSize 64, Momentum 0.99 and Validation frequency 25 (
Fig. 3).
 | Fig. 3Graph of training accuracy and loss for the final CNN model. CNN, convolutional neural network. 
|
The testing image dataset was assessed by the network to evaluate the performance of the trained model. When testing dataset with 120 hidden images in a one-by-one manner was tested by the CNN model, the correct age was achieved with 30.0% accuracy (correctly assigned 36 out of 120 cases).
Go to :

Discussion
Clavicle is considered the bone of choice in age estimation due to its universal reliability compared to other bone parts. It has been proved to be the most reliable indicator for age estimation in radiological studies [
17]. The medial part of the clavicle is widely used in forensic field since it was less affected by weight than the body and acromion part. In postmortem condition, medial clavicle is also the least damaged part while the acromion part is highly fragile especially in elderly [
18]. The body is markedly affected by force and is composed with different cortical components and properties compared with acromion and sternal end. Accordingly, it is the most frequent (70%–80%) location of fracture [
9].
In the current study, the area of 3×3 cm from sternal end of the medial clavicle was selected as a region of interest from the reason that the medial end is the most preserved part and would accurately depicts bone cortex and trabecular representing bone porosity. Physiologically, our bodies balance between bone degenerative process and remodeling process which the latter decreases as we age [
13]. This results in increase porosity and decrease bone mass in elderly [
19]. In our settings, using anatomical landmarks on radiographs to mark region of interest such as costoclavicular ligament [
18] was not accomplished due to technical limitations and radiographs quality.
Currently, clavicle is used in age estimation by observing gross appearance of epiphyseal plate closure separating into 4 stages (Webb and Suchey classification) by using completeness of epiphyseal union. However, even though it is very accurate, it could only estimate age in adolescents and young adult (15–29 years old in Thai population) [
19]. The Study Group on Forensic Age Diagnostics Arbeitsgemeinschaft für Forensische Altersdiagnostik (AGFAD), recommended that the medial clavicular epiphyses using the radiographic images are one of the evidence to determine for adulthood legal age estimation [
18,
20,
21] while estimation in other age range is still in debate [
22]. Thus, clavicles aged between 30–100 years old were chosen, seeking an alternative way to accurately estimate age at this range and clavicles aged under 30 years were excluded from this study.
According to Marera and Satyapal [
9], age assessment using clavicle epiphyseal plate closure was more precise in males than in females. This correlates with the fact that female is affected from osteoporosis more and faster than male from smaller bone size and the process of estrogen (bone resorption) as the age advanced [
23]. The effect of estrogen mainly acts on trabecular bone [
24] thus, male, and female individuals show a different pattern of osteoporosis and should yield different pattern. From this reason, male and female bones was not categorized as a same data set to train deep learning.
Our study also shows that when combined with female, the accuracy of age estimation using deep learning were 25.9% which is lower than male alone (31.9%). Besides, our Forensic Osteology Research Center houses bone collection of 278 Thai mongoloid males which is twice as much of female. Consequently, unlike previous studies, male is our main focus.
Regarding the knowledge mentioned before that bone porosity increases as age advances, nowadays a lot of research is conducted using various imaging techniques to detect bone porosity or bone mass for age estimation. These methods could widely expand age range of interest and are more practical compared to the epiphyseal method. According to Benito et al. [
18], grey average together with cortical thickness of clavicle from radiograph show significant negative correlation with age. Similar results have been obtained by Botha et al. [
11] and Navega et al. [
25] which used BMD from DXA as an indicator. However, these two studies were conducted using femur instead of clavicle.
The advancement of Deep learning in recent years, especially the development of a certain type of neural network called CNN, has benefited medical field through the use of pattern recognition [
23]. In Forensic works, CNN has mainly been studied as a tool for age and sex assessment [
26]. Several modalities of images from X-ray, DXA, MRI and CT are applied to various networks, newly developed or pre-trained, to obtain the highest performing model [
27]. Navega et al. [
25] accomplished in combining deep learning with DXA imaging modality of femur to estimate age at death in Caucasian female. In Mongoloid, this type of studies is still limited.
In this study, left and right clavicles were computed altogether since author wanted deep learning to be able to estimate age at death from both sides. This would allow applying this deep learning in real life situations where any sides of the clavicle could be left in crime scene.
The results from our first few experiments using solely original male clavicular images yield only 68% validation accuracy despite achieving 100% training accuracy. This indicates overfitting which, in layman’s terms, means that AI model has learned in a manner that is only applicable to the training sample and is no longer generalizable to the overall population. Ideally, to overcome this problem, more training data should be collected and general basic starting point of training of 1,000 images per category should be reached [
28]. However, in medical field, this task is merely impossible due to various reasons such as ethical privacy, high cost of obtaining data, requirement of data labelling from specialists and rare incidence of diseases [
29-
31]. Although there are several open access databases offering medical images for researchers, our study, which focus on Thai population, may not benefit from them. To tackle the insufficient data, our study used data augmentation, a technical solution widely adopted in medical imaging field [
31-
33].
GAN, a machine learning framework composing of a generator network and discriminator network, are extensively used for medical image synthesis [
32]. Many studies have evaluated the quality of data generated from GAN by using it for various tasks through training of other CNNs. Frid-Adar, whose study focused on image classification, reported a performance boosted from 88.4% to 92.4% specificity. Another study on lung cytology images also reported a better outcome after using GAN with a rise of 4.3% accuracy. Other studies have shown promising results as well [
32,
34,
35]. These illustrate GAN’s generalizability thus can be apply with clavicular radiographs. The result from our study is in concordance with the mentioned studies as we found a 5% increase in validation accuracy of testing data.
Pattern recognition is the core concept of image classification carried out by CNN, implying that image with better quality would enable distinguish features to be drawn out more accurately [
33]. In his research, Samuel Dodge found noise and blur to have a negative impact on neural networks’ performance, including GoogLeNet’s. However, training CNN with low quality image may not always decrease the network performance. If the trained network is tested on images of the approximate quality, its performance is likely to be improved. On the contrary, if the test set contains higher resolution data, the network’s validation accuracy would have a tendency to plummet [
33]. Our study was situated in this scenario which the majority of images (3,186 of 3,684 or 86.5%) used for training were derived from GAN with apparent lower resolution while the test set comprised of only original images.
This could explain the wide gap between our training and testing validation accuracy, a sharp decrease from 87% to 30% accuracy respectively. Our problem could be solved utilizing GAN with optimal fine tuning for better resolution, but due to hardware and time limitation, this was not possible. Another explanation of our finding would be the unequally distributed data [
36]. From the confusion matrix (
Fig. 2), it showed that our network performed best in the category with the most images and most of the false positive falls into that same category. Additional information such as socioeconomic, nutrition, occupation, and other factors which were not included in the assessment would be beneficial to our study.
Our initial hypothesis was that decreasing trabecular density, a known indicator for increasing age, would correlate with overall grey level of clavicular radiographs allowing the network to recognize difference pattern of each age group. However, with our imaging protocol, digital radiograph may not be able to capture minimal variation in grey level of each age group, resulting in poor categorization [
18]. Further studies using other imaging techniques such as MRI or CT should be considered as it would allow CNN to be trained with higher quality images.
Our preliminary study has given an insight into what CNN could achieve in the field of age at death estimation. Even though the accuracy of the test set is quite low, the results from this study show the various possibilities to apply the CNN model for age estimation in a Thai population sample and to decrease the subjectivity as well as errors in the measurement. The result shows possibility of using CNN as a part of identification tool although, facing major limitation of small dataset. The accuracy of network is expected to increase as more data and resources are provided. In the future, collaboration with other bone collections to expand our samples and to develop more accurate results is expected. Apart from that, a more extensive network modification with advanced technique would result in an even more practical model. The success of further development will present an opportunity for real life application of the technology.
Go to :
