Journal List > Korean J Leg Med > v.38(2) > 1004737

Kim, Jung, Shin, Yang, and Yang: Sequence Generation and Genotyping of 15 Autosomal STR Markers Using Next Generation Sequencing

Abstract

Recently, next generation sequencing (NGS) has received attention as the ultimate genotyping method to overcome the limitations of capillary electrophoresis (CE)-based short tandem repeat (STR) analysis, such as the limited number of STR loci that can be measured simultaneously using fluorescent-labeled primers and the maximum size of STR amplicons. In this study, we analyzed 15 autosomal STR markers via the NGS method and evaluated their effectiveness in STR analysis. Using male and female standard DNA as single-sources and their 1:1 mixture, we sequentially generated sample amplicons by the multiplex polymerase chain reaction (PCR) method, constructed DNA libraries by ligation of adapters with a multiplex identifier (MID), and sequenced DNA using the Roche GS Junior Platform. Sequencing data for each sample were analyzed via alignment with pre-built reference sequences. Most STR alleles could be determined by applying a coverage threshold of 20% for the two single-sources and 10% for the 1:1 mixture. The structure of the STR in each allele was accurately determined by examining the sequences of the target STR region. The mixture ratio of the mixed sample was estimated by analyzing the coverage ratios between assigned alleles at each locus and the reference/variant ratios from the observed sequence variations. In conclusion, the experimental method used in this study allowed the successful generation of NGS data. In addition, the NGS data analysis protocol enables accurate STR allele call and repeat structure determination at each locus. Therefore, this approach using the NGS system will be helpful to interpret and analysis the STR profiles from singe-source and even mixed samples in forensic investigation.

REFERENCES

1. Thompson R, Zoppis S, McCord B. An overview of DNA typing methods for human identification: past, present, and future. Methods Mol Biol. 2012; 830:3–16.
crossref
2. Kayser M, de Knijff P. Improving human forensics through advances in genetics, genomics and molecular biology. Nat Rev Genet. 2011; 12:179–92.
crossref
3. Berglund EC, Kiialainen A, Syva ¨nen AC. Next-generation sequencing technologies and applications for human genetic history and forensics. Investig Genet. 2011; 2:23.
crossref
4. Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010; 11:31–46.
crossref
5. Cho IS, Blaser MJ. The human microbiome: at the interface of health and disease. Nat Rev Genet. 2012; 13:260–70.
crossref
6. Bamshad MJ, Ng SB, Bigham AW, et al. Exome sequencing as a tool for Mendelian disease gene discovery. Nat Rev Genet. 2012; 12:745–55.
crossref
7. Ozsolak F, Milos PM. RNA sequencing: advances, challenges and opportunities. Nat Rev Genet. 2011; 12:87–98.
crossref
8. Meyerson M, Gabriel S, Getz G. Advances in understanding cancer genomes through second-generation sequencing. Nat Rev Genet. 2010; 11:685–96.
crossref
9. Laird PW. Principles and challenges of genomewide DNA methylation analysis. Nat Rev Genet. 2010; 11:191–203.
crossref
10. Van Neste C, Van Nieuwerburgh F, Van Hoofstat D, et al. Forensic STR analysis using massive parallel sequencing. Forensic Sci Int Genet. 2012; 6:810–8.
crossref
11. Rockenbauer E, Hansen S, Mikkelsen M, et al. Characterization of mutations and sequence variants in the D21S11 locus by next generation sequencing. Forensic Sci Int Genet. 2014; 8:68–72.
crossref
12. Fordyce SL, A′vila-Arcos MC, Rockenbauer E, et al. High-throughput sequencing of core STR loci for forensic genetic investigations using the Roche Genome Sequencer FLX platform. Biotechniques. 2011; 51:127–33.
crossref
13. Dalsgaard S, Rockenbauer E, Buchard A, et al. Non-uniform phenotyping of D12S391 resolved by second generation sequencing. Forensic Sci Int Genet. 2014; 8:195–9.
crossref
14. Scheible M, Loreille O, Just R, et al. Short tandem repeat sequencing on the 454 platforms. Forensic Sci Int Genet Suppl Ser. 2011; 3:357–8.
15. Bornman DM, Hester ME, Schuetter JM, et al. Short-read, high-throughput sequencing technology for STR genotyping. Biotechniques. 2012; 0:1–6.
crossref
16. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012; 9:357–9.
crossref
17. Li H, Handsaker B, Wysoker A, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009; 25:2078–9.
crossref
18. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010; 26:841–2.
crossref
19. Robinson JT, Thorvaldsdo ′ttir H, Winckler W, et al. Integrative genomics viewer. Nat Biotechnol. 2011; 29:24–6.
crossref
20. Van Neste C, Vandewoestyne M, Van Criekinge W, et al. My-Forensic-Loci-queries (MyFLq) framework for analysis of forensic STR data generated by massive parallel sequencing. Forensic Sci Int Genet. 2014; 9:1–8.
crossref
21. Gymrek M, Golan D, Rosset S, et al. lobSTR: a short tandem repeat profiler for personal genomes. Genome Res. 2012; 22:1154–62.
crossref

Fig. 1.
Schematic view of STR reference sequences. Long flanking sequences ranged between 500 bp and 550 bp in STR reference sequences were designed for complete alignment of sample sequences that generated with any primer combinations.
kjlm-38-48f1.tif
Fig. 2.
Quality check of constructed libraries on High Sensitivity chip using 2100 Bioanalyzer. Fragments less than 100 bp including adaptor dimers were successfully removed. a: Standard male DNA 2800M; b: Standard female DNA 9947A; c: 1:1 mixture
kjlm-38-48f2.tif
Fig. 3.
Estimation of mixture ratio based on reference/variant ratios from observed sequence variations in D13S317 locus. The sequence variation of adenine (A) to thymine (T) was detected in 3´ flanking region of D13S317 locus. Mixture ratio was estimated to 46% (A) : 53% (T). a: Standard male DNA 2800M; b: Standard female DNA 9947A; c: 1:1 mixture; d: Mixture ratio
kjlm-38-48f3.tif
Table 1.
Adjusted Final Concentrations of Primer Sets for Multiplex PCR system∗
Loci Primer Primer sequences (5’→3’) Final Conc. (uM)
D3S1358 D3-PP16-F ACTGCAGTCCAATCTGGGT 0.20
D3-PP16-R ATGAAATCAACAGAGGCTTGC  
TH01 TH01-PP16-F GTGATTCCCATTGGCCTGTTC 0.10
TH01-PP16-R ATTCCTGTGGGCTGAAAAGCTC  
D21S11 D21-PP16-F ATATGTGAGTCAATTCCCCAAG 0.60
D21-PP16-R TGTATTAGTCAATGTTCTCCAGAGAC  
D18S51 D18-PP16-F TTCTTGAGCCCAGAAGGTTA 0.50
D18-PP16-R ATTCTACCAGCAACAACACAAATAAAC  
Penta E PentaE-PP16-F ATTACCAACATGAAAGGGTACCAATA 1.20
PentaE-PP16-R TGGGTTATTAATTGAGAAAACTCCTTACAATTT  
D5S818 D5-PP16-F GGTGATTTTCCTCTTTGGTATCC 0.20
D5-PP16-R AGCCACAGTTTACAACATTTGTATCT  
D13S317 D13-PP16-F ATTACAGAAGTCTGGGATGTGGAGGA 0.40
D13-PP16-R GGCAGCCCAAAAAGACAGA  
D7S820 D7-PP16-F ATGTTGGTCAGGCTGACTATG 0.30
D7-PP16-R GATTCCACATTTATCCTCATTGAC  
D16S539 D16-PP16-F GGGGGTCTAAGAGCTTGTAAAAAG 0.40
D16-PP16-R GTTTGTGTGTGCATCTGTAAGCATGTATC  
CSF1PO CSF1PO-PP16-F CCGGAGGTAAAGGTGTCTTAAAGT 0.30
CSF1PO-PP16-R ATTTCCTGTGTCAGACCCTGTT  
Penta D PentaD-PP16-F GAAGGTCGAAGCTGAAGTG 1.20
PentaD-PP16-R ATTAGAATTCTTTAATCTGGACACAAG  
Amelogenin Amelo-PP16-F CCCTGGGCTCTGTAAAGAA 0.25
Amelo-PP16-R ATCAGAGCTTAAACTGGGAAGCTG  
vWA vWA-PP16-F GCCCTAGTGGATGATAAGAATAATCAGTATGTG 0.15
vWA-PP16-R GGACAGATGATAAATACATAGGATGGATGG  
D8S1179 D8-PP16-F ATTGCAACTTATATGTATTTTTGTATTTCATG 0.50
D8-PP16-R ACCAAATTGTGTTCATGAGTATAGTTTC  
TPOX TPOX-PP16-F GCACAGAACAGGCACTTAGG 0.15
TPOX-PP16-R CGCTCAAACGTGAGGTTG  
FGA FGA-PP16-F GGCTGCAGGGCATAACATTA 0.60
FGA-PP16-R ATTCTATGACTTTGCGCTTCAGGA  

Each primer sequence based on the information from PowerPlex 16 system without fluorescent dye

Table 2.
Read Counts of 15 STR Loci in Each Sample
STR locus Amplicon size range (bp)   2800M     9947A     1:1 mixture  
All Entire STR Entire STR/ All (%) All Entire STR Entire STR/ All (%) All Entire STR Entire STR/ All (%)
D3S1358 115-147 9470 8743 92.3 6341 6012 94.8 14261 13306 93.3
D5S818 119-155 9485 8705 91.8 5523 5011 90.7 9347 8531 91.3
D7S820 215-247 3676 3476 94.6 1868 1780 95.3 4815 4603 95.6
D8S1179 203-247 4458 4017 90.1 1967 1805 91.8 3368 3054 90.7
D13S317 169-201 4897 4631 94.6 4060 3868 95.3 12839 12140 94.6
D16S439 264-304 967 877 90.7 708 655 92.5 2497 2361 94.6
D18S51 209-366 739 332 44.9 1284 546 42.5 1117 481 43.1
D21S11 203-259 3045 2313 76.0 2996 2525 84.3 4873 3871 79.4
CSF1PO 321-357 291 244 83.8 596 522 87.6 862 742 86.1
FGA 322-444 956 460 48.1 666 255 38.3 3137 1440 45.9
Penta D 376-441 142 31 21.8 267 56 21.0 403 75 18.6
Penta E 379-474 193 84 43.5 356 116 32.6 563 309 54.9
TH01 156-195 5503 4620 84.0 3324 2811 84.6 6712 5518 82.2
TPOX 262-290 269 230 85.5 215 183 85.1 679 576 84.8
vWA 123-171 3153 2782 88.2 1014 919 90.6 8565 7649 89.3
AMEL 106, 112 3416 3247 95.1 1773 1741 98.2 2334 2247 96.3
Total   50550 44792 88.6 32958 28805 87.4 76372 66903 87.6

All aligned reads regardless of the presence or absence of STR region

Aligned reads containing entire STR region Entire STR with less than 50% represents in bold text

Table 3.
Determination of D3S1358 Alleles based on Percentage of Allele Coverage in 2 Single-sources and 1:1 Mixture
Alleles 2800M 9947A 1:1 mixture
Allele read count Allele coverage∗ (%) Allele read count Allele coverage∗ (%) Allele read count Allele coverage∗ (%)
11 0   2 0.03 0  
12 0   12 0.20 5 0.04
13 2 0.02 217 3.61 103 0.77
14 13 0.15 2868 47.70 1519 11.41
15 71 0.81 2879 47.89 2245 16.87
16 495 5.66 34 0.57 541 4.06
17 4355 49.81 0   4936 37.09
18 3757 42.97 0   3906 29.35
19 24 0.27 0   33 0.25
20 26 0.30 0   21 0.16
Total 8743   6012   13309  

Shaded sections indicate assigned alleles based on the analytical threshold

Percentage of allele coverage (%) = allele read count/locus read count×100

Table 4.
STR Genotyping Results in 2 Single-sources and 1:1 Mixture examined by CE and NGS Analyses
STR locus 2800M 9947A 1:1 mixture
CE NGS CE NGS CE NGS
D3S1358 17, 18 17, 18 14, 15 14, 15 14, 15, 17, 18 14, 15, 17, 18
D5S818 12 12 11 11 11, 12 11, 12
D7S820 8, 11 8, 11 10, 11 10, 11 8, 10, 11 8, 10, 11
D8S1179 14, 15 14, 15 13 13 13, 14, 15 13, 14, 15
D13S317 9, 11 9, 11 11 11 9, 11 9, 11
D16S539 9, 13 9, 13 11, 12 11, 12 9, 11, 12, 13 9, 11, 12, 13
D18S51 16, 18 16, 18 15, 19 15, 19 15, 16, 18, 19 15, 16, 18, 19
D21S11 29, 31.2 29, 31.2 30 30 29, 30, 31.2 29, 30, 31.2
CSF1PO 12 12 10, 12 10, 12 10, 12 10, 12
FGA 20, 23 20, 23 23, 24 23, 24 20, 23, 24 20, 23, 24
Penta D 12, 13 12, 13 12 12 12, 13 12, 13
Penta E 7, 14 7, 14 12, 13 12, 13 7, 12, 13, 14 7, (12), 13, 14
TH01 6, 9.3 6, 9.3 8, 9.3 8, 9.3 6, 8, 9.3 6, 8, 9.3
TPOX 11 11 8 8 8, 11 8, 11
vWA 16, 19 16, 19 17, 18 17, 18 16, 17, 18, 19 16, 17, (18), 19

Alleles in parentheses represent true allele with coverage value less than 10% of total coverage value

Table 5.
Repeat Structures of 15 STRs in Two Standard Samples from NGS Data
A. 2800M
STR locus Genotype Core repeat Repeat structure
D3S1358 17, 18 TCTA 17: TCTA [TCTG]3 [TCTA]13
18: TCTA [TCTG]3 [TCTA]14
D5S818 12 AGAT 12: [AGAT]12
D7S820 8, 11 GATA 8: [GATA]8
11: [GATA]11
D8S1179 14, 15 TCTA 14: TCTA TCTG [TCTA]12
15: [TCTA]2 TCTG [TCTA]12
D13S317 9, 11 TATC 9: [TATC]9 [AATC]2
11: [TATC]11 TATC AATC
D16S539 9, 13 GATA 9: [GATA]9
13: [GATA]13
D18S51 16, 18 AGAA 16: [AGAA]16 AAAG [AG]3
18: [AGAA]18 AAAG [AG]3
D21S11 29, 31.2 TCTA 29: [TCTA]4 [TCTG]6 [TCTA]3 TA [TCTA]3 TCA [TCTA]2 TCCA TA [TCTA]11
31.2: [TCTA]5 [TCTG]6 [TCTA]3 TA [TCTA]3 TCA [TCTA]2 TCCA TA [TCTA]11 TA TCTA
CSF1PO 12 AGAT 12: [AGAT]12
FGA 20, 23 CTTT 20: [TTTC]3 TTTT TTCT [CTTT]12 CTCC [TTCC]2
23: [TTTC]3 TTTT TTCT [CTTT]15 CTCC [TTCC]2
Penta D 12, 13 AAAGA 12: [AAAGA]12
13: [AAAGA]13
Penta E 7, 14 AAAGA 7: [AAAGA]7
14: [AAAGA]14
TH01 6, 9.3 AATG 6: [AATG]6
9.3: [AATG]6 ATG [AATG]3
TPOX 11 AATG 11: [AATG]11
vWA 16, 19 TCTA 16: TCTA [TCTG]3 [TCTA]12 TCCA TCTA
19: TCTA [TCTG]4 [TCTA]14 TCCA TCTA
B. 9947A
STR locus Genotype Core repeat Repeat structure
D3S1358 14, 15 TCTA 14: TCTA [TCTG]2 [TCTA]11
15: TCTA [TCTG]2 [TCTA]12
D5S818 11 AGAT 12: [AGAT]11
D7S820 10, 11 GATA 10: [GATA]10
11: [GATA]11
D8S1179 13 TCTA 13a: TCTA TCTG [TCTA]11
13b: [TCTA]13
D13S317 11 TATC 11: [TATC]11 [AATC]2
D16S539 11, 12 GATA 11: [GATA]11
12: [GATA]12
D18S51 15, 19 AGAA 15: [AGAA]15 AAAG [AG]3
19: [AGAA]19 AAAG [AG]3
D21S11 30 TCTA 30: [TCTA]6 [TCTG]5 [TCTA]3 TA [TCTA]3 TCA [TCTA]2 TCCA TA [TCTA]11
CSF1PO 10, 12 AGAT 10: [AGAT]10
12: [AGAT]12
FGA 23, 24 CTTT 23: [TTTC]3 TTTT TTCT [CTTT]15 CTCC [TTCC]2
24: [TTTC]3 TTTT TTCT [CTTT]16 CTCC [TTCC]2
Penta D 12 AAAGA 12: [AAAGA]12
Penta E 12, 13 AAAGA 12: [AAAGA]12
13: [AAAGA]13
TH01 8, 9.3 AATG 8: [AATG]8
9.3: [AATG]6 ATG [AATG]3
TPOX 8 AATG 8: [AATG]8
vWA 17, 18 TCTA 17: TCTA [TCTG]4 [TCTA]12 TCCA TCTA
      18: TCTA [TCTG]4 [TCTA]13 TCCA TCTA
TOOLS
Similar articles