Sequence Generation and Genotyping of 15 Autosomal STR Markers Using Next Generation Sequencing

Eun Hye Kim; Sang-Eun Jung; Kyoung-Jin Shin; Woo Ick Yang; In Seok Yang

doi:10.7580/kjlm.2014.38.2.48

Journal List > Korean J Leg Med > v.38(2) > 1004737

Go to TopGo to Top Go to BottomGo to Bottom

TOOLS

Kim, Jung, Shin, Yang, and Yang: Sequence Generation and Genotyping of 15 Autosomal STR Markers Using Next Generation Sequencing

원 저

Korean J Leg Med 2014;38(2):48-58.

Published online: 17 January 2014

DOI: https://doi.org/10.7580/kjlm.2014.38.2.48

Sequence Generation and Genotyping of 15 Autosomal STR Markers Using Next Generation Sequencing

Eun Hye Kim^1,², Sang-Eun Jung¹, Kyoung-Jin Shin^1,², Woo Ick Yang¹, In Seok Yang¹

¹Department of Forensic Medicine, Yonsei University College of Medicine, Seoul, Korea

²Brain Korea 21 PLUS Project for Medical Science, Yonsei University, Seoul, Korea

본 연구과제는 2012년도 대검찰청의‘범 죄자 DNA DB및 DNA 감식 기술의 국산 화 및 차세대 선진기술 기반 구축 연구개 발비 (1333-304-260) ’의 지원을 받아 수 행되었습니다. 책임저자 : 양인석 (120-752) 서울시 서대문구 연세로 50, 연세대학교 의과대학 법의학과 전화 : +82-2-2228-2691 FAX : +82-2-362-0860 E-mail : graduate@nate.com

Received 25 April 2014 Revised 9 May 2014 Accepted 13 May 2014

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Recently, next generation sequencing (NGS) has received attention as the ultimate genotyping method to overcome the limitations of capillary electrophoresis (CE)-based short tandem repeat (STR) analysis, such as the limited number of STR loci that can be measured simultaneously using fluorescent-labeled primers and the maximum size of STR amplicons. In this study, we analyzed 15 autosomal STR markers via the NGS method and evaluated their effectiveness in STR analysis. Using male and female standard DNA as single-sources and their 1:1 mixture, we sequentially generated sample amplicons by the multiplex polymerase chain reaction (PCR) method, constructed DNA libraries by ligation of adapters with a multiplex identifier (MID), and sequenced DNA using the Roche GS Junior Platform. Sequencing data for each sample were analyzed via alignment with pre-built reference sequences. Most STR alleles could be determined by applying a coverage threshold of 20% for the two single-sources and 10% for the 1:1 mixture. The structure of the STR in each allele was accurately determined by examining the sequences of the target STR region. The mixture ratio of the mixed sample was estimated by analyzing the coverage ratios between assigned alleles at each locus and the reference/variant ratios from the observed sequence variations. In conclusion, the experimental method used in this study allowed the successful generation of NGS data. In addition, the NGS data analysis protocol enables accurate STR allele call and repeat structure determination at each locus. Therefore, this approach using the NGS system will be helpful to interpret and analysis the STR profiles from singe-source and even mixed samples in forensic investigation.

Keywords: Key Words, Short tandem repeat, Next generation sequencing, Repeat structure, Sequence variation, Mixture

REFERENCES

1. Thompson R, Zoppis S, McCord B. An overview of DNA typing methods for human identification: past, present, and future. Methods Mol Biol. 2012; 830:3–16.

2. Kayser M, de Knijff P. Improving human forensics through advances in genetics, genomics and molecular biology. Nat Rev Genet. 2011; 12:179–92.

3. Berglund EC, Kiialainen A, Syva ¨nen AC. Next-generation sequencing technologies and applications for human genetic history and forensics. Investig Genet. 2011; 2:23.

4. Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010; 11:31–46.

5. Cho IS, Blaser MJ. The human microbiome: at the interface of health and disease. Nat Rev Genet. 2012; 13:260–70.

6. Bamshad MJ, Ng SB, Bigham AW, et al. Exome sequencing as a tool for Mendelian disease gene discovery. Nat Rev Genet. 2012; 12:745–55.

7. Ozsolak F, Milos PM. RNA sequencing: advances, challenges and opportunities. Nat Rev Genet. 2011; 12:87–98.

8. Meyerson M, Gabriel S, Getz G. Advances in understanding cancer genomes through second-generation sequencing. Nat Rev Genet. 2010; 11:685–96.

9. Laird PW. Principles and challenges of genomewide DNA methylation analysis. Nat Rev Genet. 2010; 11:191–203.

10. Van Neste C, Van Nieuwerburgh F, Van Hoofstat D, et al. Forensic STR analysis using massive parallel sequencing. Forensic Sci Int Genet. 2012; 6:810–8.

11. Rockenbauer E, Hansen S, Mikkelsen M, et al. Characterization of mutations and sequence variants in the D21S11 locus by next generation sequencing. Forensic Sci Int Genet. 2014; 8:68–72.

12. Fordyce SL, A′vila-Arcos MC, Rockenbauer E, et al. High-throughput sequencing of core STR loci for forensic genetic investigations using the Roche Genome Sequencer FLX platform. Biotechniques. 2011; 51:127–33.

13. Dalsgaard S, Rockenbauer E, Buchard A, et al. Non-uniform phenotyping of D12S391 resolved by second generation sequencing. Forensic Sci Int Genet. 2014; 8:195–9.

14. Scheible M, Loreille O, Just R, et al. Short tandem repeat sequencing on the 454 platforms. Forensic Sci Int Genet Suppl Ser. 2011; 3:357–8.

15. Bornman DM, Hester ME, Schuetter JM, et al. Short-read, high-throughput sequencing technology for STR genotyping. Biotechniques. 2012; 0:1–6.

16. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012; 9:357–9.

17. Li H, Handsaker B, Wysoker A, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009; 25:2078–9.

18. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010; 26:841–2.

19. Robinson JT, Thorvaldsdo ′ttir H, Winckler W, et al. Integrative genomics viewer. Nat Biotechnol. 2011; 29:24–6.

20. Van Neste C, Vandewoestyne M, Van Criekinge W, et al. My-Forensic-Loci-queries (MyFLq) framework for analysis of forensic STR data generated by massive parallel sequencing. Forensic Sci Int Genet. 2014; 9:1–8.

21. Gymrek M, Golan D, Rosset S, et al. lobSTR: a short tandem repeat profiler for personal genomes. Genome Res. 2012; 22:1154–62.

Fig. 1.

Schematic view of STR reference sequences. Long flanking sequences ranged between 500 bp and 550 bp in STR reference sequences were designed for complete alignment of sample sequences that generated with any primer combinations.

Fig. 2.

Quality check of constructed libraries on High Sensitivity chip using 2100 Bioanalyzer. Fragments less than 100 bp including adaptor dimers were successfully removed. a: Standard male DNA 2800M; b: Standard female DNA 9947A; c: 1:1 mixture

Fig. 3.

Estimation of mixture ratio based on reference/variant ratios from observed sequence variations in D13S317 locus. The sequence variation of adenine (A) to thymine (T) was detected in 3´ flanking region of D13S317 locus. Mixture ratio was estimated to 46% (A) : 53% (T). a: Standard male DNA 2800M; b: Standard female DNA 9947A; c: 1:1 mixture; d: Mixture ratio

Table 1.

Adjusted Final Concentrations of Primer Sets for Multiplex PCR system∗

Loci	Primer	Primer sequences (5’→3’)	Final Conc. (uM)
D3S1358	D3-PP16-F	ACTGCAGTCCAATCTGGGT	0.20
D3S1358	D3-PP16-R	ATGAAATCAACAGAGGCTTGC
TH01	TH01-PP16-F	GTGATTCCCATTGGCCTGTTC	0.10
TH01	TH01-PP16-R	ATTCCTGTGGGCTGAAAAGCTC
D21S11	D21-PP16-F	ATATGTGAGTCAATTCCCCAAG	0.60
D21S11	D21-PP16-R	TGTATTAGTCAATGTTCTCCAGAGAC
D18S51	D18-PP16-F	TTCTTGAGCCCAGAAGGTTA	0.50
D18S51	D18-PP16-R	ATTCTACCAGCAACAACACAAATAAAC
Penta E	PentaE-PP16-F	ATTACCAACATGAAAGGGTACCAATA	1.20
Penta E	PentaE-PP16-R	TGGGTTATTAATTGAGAAAACTCCTTACAATTT
D5S818	D5-PP16-F	GGTGATTTTCCTCTTTGGTATCC	0.20
D5S818	D5-PP16-R	AGCCACAGTTTACAACATTTGTATCT
D13S317	D13-PP16-F	ATTACAGAAGTCTGGGATGTGGAGGA	0.40
D13S317	D13-PP16-R	GGCAGCCCAAAAAGACAGA
D7S820	D7-PP16-F	ATGTTGGTCAGGCTGACTATG	0.30
D7S820	D7-PP16-R	GATTCCACATTTATCCTCATTGAC
D16S539	D16-PP16-F	GGGGGTCTAAGAGCTTGTAAAAAG	0.40
D16S539	D16-PP16-R	GTTTGTGTGTGCATCTGTAAGCATGTATC
CSF1PO	CSF1PO-PP16-F	CCGGAGGTAAAGGTGTCTTAAAGT	0.30
CSF1PO	CSF1PO-PP16-R	ATTTCCTGTGTCAGACCCTGTT
Penta D	PentaD-PP16-F	GAAGGTCGAAGCTGAAGTG	1.20
Penta D	PentaD-PP16-R	ATTAGAATTCTTTAATCTGGACACAAG
Amelogenin	Amelo-PP16-F	CCCTGGGCTCTGTAAAGAA	0.25
Amelogenin	Amelo-PP16-R	ATCAGAGCTTAAACTGGGAAGCTG
vWA	vWA-PP16-F	GCCCTAGTGGATGATAAGAATAATCAGTATGTG	0.15
vWA	vWA-PP16-R	GGACAGATGATAAATACATAGGATGGATGG
D8S1179	D8-PP16-F	ATTGCAACTTATATGTATTTTTGTATTTCATG	0.50
D8S1179	D8-PP16-R	ACCAAATTGTGTTCATGAGTATAGTTTC
TPOX	TPOX-PP16-F	GCACAGAACAGGCACTTAGG	0.15
TPOX	TPOX-PP16-R	CGCTCAAACGTGAGGTTG
FGA	FGA-PP16-F	GGCTGCAGGGCATAACATTA	0.60
FGA	FGA-PP16-R	ATTCTATGACTTTGCGCTTCAGGA

^∗ Each primer sequence based on the information from PowerPlex 16 system without fluorescent dye

Table 2.

Read Counts of 15 STR Loci in Each Sample

STR locus	Amplicon size range (bp)		2800M			9947A			1:1 mixture
STR locus	Amplicon size range (bp)	All^∗	Entire STR^†	Entire STR/ All (%)	All	Entire STR	Entire STR/ All (%)	All	Entire STR	Entire STR/ All (%)
D3S1358	115-147	9470	8743	92.3	6341	6012	94.8	14261	13306	93.3
D5S818	119-155	9485	8705	91.8	5523	5011	90.7	9347	8531	91.3
D7S820	215-247	3676	3476	94.6	1868	1780	95.3	4815	4603	95.6
D8S1179	203-247	4458	4017	90.1	1967	1805	91.8	3368	3054	90.7
D13S317	169-201	4897	4631	94.6	4060	3868	95.3	12839	12140	94.6
D16S439	264-304	967	877	90.7	708	655	92.5	2497	2361	94.6
D18S51	209-366	739	332	44.9	1284	546	42.5	1117	481	43.1
D21S11	203-259	3045	2313	76.0	2996	2525	84.3	4873	3871	79.4
CSF1PO	321-357	291	244	83.8	596	522	87.6	862	742	86.1
FGA	322-444	956	460	48.1	666	255	38.3	3137	1440	45.9
Penta D	376-441	142	31	21.8	267	56	21.0	403	75	18.6
Penta E	379-474	193	84	43.5	356	116	32.6	563	309	54.9
TH01	156-195	5503	4620	84.0	3324	2811	84.6	6712	5518	82.2
TPOX	262-290	269	230	85.5	215	183	85.1	679	576	84.8
vWA	123-171	3153	2782	88.2	1014	919	90.6	8565	7649	89.3
AMEL	106, 112	3416	3247	95.1	1773	1741	98.2	2334	2247	96.3
Total		50550	44792	88.6	32958	28805	87.4	76372	66903	87.6

^∗ All aligned reads regardless of the presence or absence of STR region

^† Aligned reads containing entire STR region Entire STR with less than 50% represents in bold text

Table 3.

Determination of D3S1358 Alleles based on Percentage of Allele Coverage in 2 Single-sources and 1:1 Mixture

Alleles	2800M		9947A		1:1 mixture
Alleles	Allele read count	Allele coverage∗ (%)	Allele read count	Allele coverage∗ (%)	Allele read count	Allele coverage∗ (%)
11	0		2	0.03	0
12	0		12	0.20	5	0.04
13	2	0.02	217	3.61	103	0.77
14	13	0.15	2868	47.70	1519	11.41
15	71	0.81	2879	47.89	2245	16.87
16	495	5.66	34	0.57	541	4.06
17	4355	49.81	0		4936	37.09
18	3757	42.97	0		3906	29.35
19	24	0.27	0		33	0.25
20	26	0.30	0		21	0.16
Total	8743		6012		13309

Shaded sections indicate assigned alleles based on the analytical threshold

^∗ Percentage of allele coverage (%) = allele read count/locus read count×100

Table 4.

STR Genotyping Results in 2 Single-sources and 1:1 Mixture examined by CE and NGS Analyses

STR locus	2800M		9947A		1:1 mixture
STR locus	CE	NGS	CE	NGS	CE	NGS
D3S1358	17, 18	17, 18	14, 15	14, 15	14, 15, 17, 18	14, 15, 17, 18
D5S818	12	12	11	11	11, 12	11, 12
D7S820	8, 11	8, 11	10, 11	10, 11	8, 10, 11	8, 10, 11
D8S1179	14, 15	14, 15	13	13	13, 14, 15	13, 14, 15
D13S317	9, 11	9, 11	11	11	9, 11	9, 11
D16S539	9, 13	9, 13	11, 12	11, 12	9, 11, 12, 13	9, 11, 12, 13
D18S51	16, 18	16, 18	15, 19	15, 19	15, 16, 18, 19	15, 16, 18, 19
D21S11	29, 31.2	29, 31.2	30	30	29, 30, 31.2	29, 30, 31.2
CSF1PO	12	12	10, 12	10, 12	10, 12	10, 12
FGA	20, 23	20, 23	23, 24	23, 24	20, 23, 24	20, 23, 24
Penta D	12, 13	12, 13	12	12	12, 13	12, 13
Penta E	7, 14	7, 14	12, 13	12, 13	7, 12, 13, 14	7, (12), 13, 14
TH01	6, 9.3	6, 9.3	8, 9.3	8, 9.3	6, 8, 9.3	6, 8, 9.3
TPOX	11	11	8	8	8, 11	8, 11
vWA	16, 19	16, 19	17, 18	17, 18	16, 17, 18, 19	16, 17, (18), 19

Alleles in parentheses represent true allele with coverage value less than 10% of total coverage value

Table 5.

Repeat Structures of 15 STRs in Two Standard Samples from NGS Data

A. 2800M
STR locus	Genotype	Core repeat	Repeat structure
D3S1358	17, 18	TCTA	17: TCTA [TCTG]₃ [TCTA]₁₃
D3S1358	17, 18	TCTA	18: TCTA [TCTG]₃ [TCTA]₁₄
D5S818	12	AGAT	12: [AGAT]₁₂
D7S820	8, 11	GATA	8: [GATA]₈
D7S820	8, 11	GATA	11: [GATA]11
D8S1179	14, 15	TCTA	14: TCTA TCTG [TCTA]12
D8S1179	14, 15	TCTA	15: [TCTA]2 TCTG [TCTA]12
D13S317	9, 11	TATC	9: [TATC]9 [AATC]2
D13S317	9, 11	TATC	11: [TATC]11 TATC AATC
D16S539	9, 13	GATA	9: [GATA]9
D16S539	9, 13	GATA	13: [GATA]13
D18S51	16, 18	AGAA	16: [AGAA]16 AAAG [AG]3
D18S51	16, 18	AGAA	18: [AGAA]18 AAAG [AG]3
D21S11	29, 31.2	TCTA	29: [TCTA]₄ [TCTG]₆ [TCTA]₃ TA [TCTA]₃ TCA [TCTA]₂ TCCA TA [TCTA]₁₁
D21S11	29, 31.2	TCTA	31.2: [TCTA]₅ [TCTG]₆ [TCTA]₃ TA [TCTA]₃ TCA [TCTA]₂ TCCA TA [TCTA]₁₁ TA TCTA
CSF1PO	12	AGAT	12: [AGAT]₁₂
FGA	20, 23	CTTT	20: [TTTC]₃ TTTT TTCT [CTTT]₁₂ CTCC [TTCC]₂
FGA	20, 23	CTTT	23: [TTTC]₃ TTTT TTCT [CTTT]₁₅ CTCC [TTCC]₂
Penta D	12, 13	AAAGA	12: [AAAGA]₁₂
Penta D	12, 13	AAAGA	13: [AAAGA]₁₃
Penta E	7, 14	AAAGA	7: [AAAGA]₇
Penta E	7, 14	AAAGA	14: [AAAGA]₁₄
TH01	6, 9.3	AATG	6: [AATG]₆
TH01	6, 9.3	AATG	9.3: [AATG]₆ ATG [AATG]₃
TPOX	11	AATG	11: [AATG]₁₁
vWA	16, 19	TCTA	16: TCTA [TCTG]₃ [TCTA]₁₂ TCCA TCTA
vWA	16, 19	TCTA	19: TCTA [TCTG]₄ [TCTA]₁₄ TCCA TCTA

B. 9947A
STR locus	Genotype	Core repeat	Repeat structure
D3S1358	14, 15	TCTA	14: TCTA [TCTG]₂ [TCTA]₁₁
D3S1358	14, 15	TCTA	15: TCTA [TCTG]₂ [TCTA]₁₂
D5S818	11	AGAT	12: [AGAT]₁₁
D7S820	10, 11	GATA	10: [GATA]₁₀
D7S820	10, 11	GATA	11: [GATA]₁₁
D8S1179	13	TCTA	13a: TCTA TCTG [TCTA]₁₁
D8S1179	13	TCTA	13b: [TCTA]₁₃
D13S317	11	TATC	11: [TATC]₁₁ [AATC]₂
D16S539	11, 12	GATA	11: [GATA]₁₁
D16S539	11, 12	GATA	12: [GATA]₁₂
D18S51	15, 19	AGAA	15: [AGAA]₁₅ AAAG [AG]₃
D18S51	15, 19	AGAA	19: [AGAA]₁₉ AAAG [AG]₃
D21S11	30	TCTA	30: [TCTA]₆ [TCTG]₅ [TCTA]₃ TA [TCTA]₃ TCA [TCTA]₂ TCCA TA [TCTA]₁₁
CSF1PO	10, 12	AGAT	10: [AGAT]₁₀
CSF1PO	10, 12	AGAT	12: [AGAT]₁₂
FGA	23, 24	CTTT	23: [TTTC]₃ TTTT TTCT [CTTT]₁₅ CTCC [TTCC]₂
FGA	23, 24	CTTT	24: [TTTC]₃ TTTT TTCT [CTTT]₁₆ CTCC [TTCC]₂
Penta D	12	AAAGA	12: [AAAGA]₁₂
Penta E	12, 13	AAAGA	12: [AAAGA]₁₂
Penta E	12, 13	AAAGA	13: [AAAGA]₁₃
TH01	8, 9.3	AATG	8: [AATG]₈
TH01	8, 9.3	AATG	9.3: [AATG]₆ ATG [AATG]₃
TPOX	8	AATG	8: [AATG]₈
vWA	17, 18	TCTA	17: TCTA [TCTG]₄ [TCTA]₁₂ TCCA TCTA
			18: TCTA [TCTG]₄ [TCTA]₁₃ TCCA TCTA

TOOLS

Similar articles