Journal List > Korean J Leg Med > v.38(2) > 1004738

Korean J Leg Med. 2014 May;38(2):59-65. Korean.
Published online May 28, 2014.
© Copyright 2014 by the Korean Society for Legal Medicine
Searching for Appropriate Statistical Parameters for Validation of Mitochondrial DNA Database
Chong Min Choung,1 Ji Hyun Lee,2 Sohee Cho,2 and Soong Deok Lee2,3
1Forensic DNA Division, National Forensic Service, Wonju-si, Gangwon, Korea.
2Department of Forensic Medicine, Seoul National University College of Medicine, Seoul, Korea.
3Institute of Forensic Science, Seoul National University College of Medicine, Seoul, Korea.

Corresponding author (Email: )
Received April 29, 2014; Revised May 19, 2014; Accepted May 20, 2014.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.


Recently, studies on mitochondrial DNA (mtDNA) have increased rapidly. Conventional parameters, such as diversity index, pairwise comparison, are used to interpret and validate data on autosomal DNA; however, the use of these parameters to validate data from mitochondrial DNA databases (mtDNA DBs) needs to be verified because of the different transmission patterns of mtDNA. This study was done to verify the use of these conventional parameters and to test the "coverage concept" for a new parameter. The mtDNA DB is not very big; however, it is necessary to check how the change in parameters corresponds to the DB size. For this, we artificially rearranged a Korean DB into several small sub-DBs of variable sizes. The results show that the diversity in nucleotide variations and the different haplotype numbers do not vary as the size of DB increases. However, the "coverage" changed a lot. The coverage increased from 0.113 in a DB of 100 people to 0.260 in a DB of 653 people. Additionally, using the "coverage concept", we predicted how the total number of haplotypes changed with variations in the sub-DB size and compared the predicted result with final result. In conclusion, "coverage", in addition to conventional statistical parameters, can be used to check the usability of an mtDNA DB. Finally, we tried to predict the size of the whole mtDNA number in Korea using "saturation concept".

Keywords: mtDNA DB; Statistical parameter; Coverage; Phylogeny; Saturation curve


Fig. 1
Saturation curves of expanded sample sizes.

a. Expanded up to 10,000 people

b. Expanded up to 100,000 people

A result of examining the number of possible observed haplotypes when group size increased up to 10,000, 100,000, the final expected number of haplotypes was 4,500 over. The shaded portion of the graph is the confidence interval upper and lower limits.

Click for larger image

Fig. 2
Result of simulated saturation curve from N. of observed haplotypes.

Graph is obtained by curve expert professional 1.6.5 version. The fit converged to a tolerance of 1e-006 in 5 iterations. No weighting is used.

Click for larger image


Table 1
Primers used to Sequence the D-loop of mtDNA
Click for larger image

Table 2
Variation of Sequence Diversity and Number of Sequence Changes at each Sample Sizes*
Click for larger image

Table 3
Number of observed Haplotypes and Comparison of Statistic Parameters*
Click for larger image

Table 4
Comparison of Coverage by the Way Simulated and Non-selected DB Showing no Significant Selection Effect
Click for larger image

Table 5
Unique Haplotype Comparison between Observed One and Estimated One using Mao Equation
Click for larger image

1. Torroni A, Achilli A, Macaulay V, et al. Harvesting the fruit of the human mtDNA tree. Trends Genet 2006;22:339–345.
2. Torroni A, Schurr TG, Cabell MF, et al. Asian affinities and continental radiation of the four founding Native American mtDNAs. Am J Hum Genet 1993;53:563–590.
3. Egeland T, Bøvelstad HM, Storvik GO, et al. Inferring the most likely geographical origin of mtDNA sequence profiles. Ann Hum Genet 2004;68:461–471.
4. Chao A, Lee SM. Estimating the number of classes via sample coverage. JASA 1992;87:210–217.
5. Huang SP, Weir BS. Estimating the total number of alleles using a sample coverage method. Genetics 2001;159:1365–1373.
6. Egeland T, Salas A. Estimating haplotype frequency and coverage of databases. PLoS One 2008;3:e3988.
7. Tajima F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 1989;123:585–595.
8. Pereira L, Cunha C, Amorim A. Predicting sampling saturation of mtDNA haplotypes: an application to an enlarged Portuguese database. Int J Legal Med 2004;118:132–136.
9. Pfeiffer H, Brinkmann B, Hühne J, et al. Expanding the forensic German mitochondrial DNA control region database: genetic diversity as a function of sample size and microgeography. Int J Legal Med 1999;112:291–298.
10. Haas PJ, König C. A bi-level Bernoulli scheme for database sampling; In proceedings of the 2004 ACM SIGMOD international conference on Management of data; ACM; 2004. pp. 275-286.
11. Mao CX. Predicting the conditional probability of discovering a new class. JASA 2004;99:1108–1118.
12. Nei M, Li WH. Mathematical model for studying genetic variation in terms of restriction endonucleases. Proc Natl Acad Sci USA 1979;76:5269–5273.
13. Bunge J, Fitzpatrick M. Estimating the number of species: a review. JASA 1993;88:364–373.