Abstract
Recent advances in long-read next-generation sequencing (NGS) have enabled researchers to identify several pathogenic variants overlooked by short-read NGS, array-based comparative genomic hybridization, and other conventional methods. Long-read NGS is particularly useful in the detection of structural variants and repeat expansions. Furthermore, it can be used for mutation screening in difficult-to-sequence regions, as well as for DNA-methylation analyses and haplotype phasing. This mini-review introduces the usefulness of long-read NGS in the molecular diagnosis of pediatric endocrine disorders.
· Long-read next-generation sequencing (NGS) can provide sequence reads of several kilobases or megabases and therefore identify several pathogenic variants that have been overlooked by other methods.
· Despite its relatively high error rate and high cost, long-read NGS is becoming an important tool for molecular diagnosis of congenital diseases, including endocrine disorders.
Nucleotide substitutions and structural variants in the genome play a major role in the development of congenital diseases, including endocrine disorders [1-3]. The identification of pathogenic variants in patients with congenital disorders helps to optimize management procedures and increase the accuracy of genetic counseling [4]. Currently, short-read nextgeneration sequencing (NGS) and array-based comparative genomic hybridization (CGH), together with conventional methods such as karyotyping, Sanger sequencing, and fluorescence in situ hybridization, are used to identify pathogenic mutations and copy number variants (CNVs) [1,2,5]. Still, these methods have several technical limitations. For example, short-read NGS typically yields sequence reads measuring 100 or 150 bp in length and is optimized to detect simple nucleotide substitutions and indels [1]. Likewise, array-based CGH primarily focuses on CNVs in single-copy genomic regions (https://www.agilent.com/en/product/cgh-cgh-snp-microarray-platform). As a result, these methods often overlook chromosomal rearrangements, particularly retrotransposon insertions, repeat expansions, and copy number neutral inversions. These missing genomic variants may account for a certain percentage of cases with congenital disorders. Moreover, short-read NGS and array-based CGH cannot be used to evaluate epigenetic abnormalities.
Recent advances in long-read NGS have enabled researchers to identify several previously unrecognized genetic variants [6-10]. Furthermore, long-read NGS can be used for mutation screening in difficult-to-sequence regions, as well as for DNA-methylation analyses and haplotype phasing [7,9]. Currently, the relatively high error rate and the high cost of long-read NGS are being improved [10]. In this mini-review, we introduce the usefulness of long-read NGS in the molecular diagnosis of congenital disorders, particularly of pediatric endocrine disorders.
Nature Methods described long-read NGS as the "Method of the Year 2022. [7]" Long-read NGS is becoming a popular tool for molecular analysis of clinical samples. Two technologies, i.e., single-molecule real-time sequencing by Pacific Biosciences (Menlo Park, CA, USA) and Oxford Nanopore Technologies (ONT) (Oxford, UK), are predominantly used in research and clinical settings [7]. These 2 technologies can provide sequence reads of several kilobases [9] and therefore are capable of characterizing complex structural variants, chromosomal inversions, retrotransposon insertions, and repeat expansions. Of course, long-read NGS can be exploited for the detection of simple nucleotide substitutions and indels, although it leads to a higher error rate in sequence data than short-read NGS. Notably, long-read NGS can be used not only for whole-genome sequencing but also for target sampling for specific regions of interest [8,11]. Target sequencing significantly reduces the cost of sequence analyses. In particular, ONT has provided a system for software-based target assignment designated as adaptive sampling [11]. Furthermore, long-read NGS has advantages in DNA-methylation analysis and rapid workflow [7].
Long-read NGS is particularly beneficial for the detection of specific types of genomic abnormalities (Fig. 1). In the following sections, we provide some examples of molecular diagnoses of endocrine disorders achieved by long-read NGS.
Recently, long-read NGS succeeded in identifying several missing variants associated with congenital diseases [6-10]. Identified variants included CNVs involving transposable elements [6,7,9]. It is known that such elements account for a substantial percentage of the human genome and that insertions of these elements can cause disease phenotypes by disrupting exons or the cis-regulatory machinery of gene expression [11,12]. Nevertheless, CNVs of transposable elements are barely discernible by short-read NGS and array-based CGH. In 2022, Miller et al. [13] performed targeted long-read NGS for a family with autosomal dominant pseudohypoparathyroidism and successfully identified a ~2.8-kb insertion in the GNAS region on chromosome 20. This insertion was a retrotransposon element composing an SVA-VNTR-Alu sequence and is assumed to have caused the disease phenotype through epigenetic dysregulation of nearby genes. Similarly, an Alu insertion was identified in an ALMS1 intron of patients with Alstrom syndrome [14], and an insertion of the SINE-VNTR-Alu retroelement was detected in an NR5A1 intron of patients with disorders of sex development [15]. Many other examples support the usefulness of long-read NGS in the detection of missing structural variants [6,7,9].
Furthermore, long-read NGS is capable of analyzing copy number alterations of repeats and repetitive sequences that are often overlooked by other methods [6,7,9]. Expansions of short tandem repeats, particularly triplet repeat expansions, are known to be one of the major causes of neurological disorders and have also been linked to other types of disorders [16,17]. For example, a trinucleotide repeat expansion in DMD intron was identified in patients with myopathy [18]. Furthermore, Miyatake et al. [19] succeeded in identifying pathogenic repeat expansions in several patients with neurological and neuromuscular diseases. The authors concluded that ONT-based adaptive sampling is superior to conventional diagnostic methods in terms of speed, accuracy, and comprehensiveness.
In addition, long-read NGS is beneficial in determining the structure of complex rearrangements [6,7,9]. In this context, recent studies have discovered unique cellular events designated as chromothripsis or chromoanagenesis, which result in highly complex rearrangements involving one or a few chromosomes (Fig. 2) [20-22]. To date, chromothripsis-/chromoanagenesis-compatible rearrangements have been identified in multiple patients with growth failure, congenital malformations, and endocrine abnormalities [20-22]. However, whole-genome sequencing using short-read NGS and arraybased CGH show difficulties in determining the alignments of chromosomal fragments of these cases. Lei et al. [23] employed long-read NGS and determined the structure of a massively rearranged chromosome in a patient with atypical Langer-Giedion syndrome and Cornelia de Lange syndrome type IV. Furthermore, long-read NGS can detect copy number neutral inversions and translocations that are usually undetectable by array-based CGH.
The human genome contains several difficult-to-sequence regions [23]. These regions often harbor homologous sequences due to segmental amplification [23]. An important example of difficult-to-sequence regions from the viewpoint of pediatric endocrinology is 6p21.33, which contains CYP21A2 [24]. CYP21A2 is the causative gene for congenital adrenal hyperplasia due to 21-hydroxylase deficiency [25]. This gene is located within a segmentally duplicated region of approximately 40 kb (Fig. 1) [25]. The presence of a pseudogene (CYP21A1P) renders CYP21A2 difficult for short-read NGS and Sanger sequencing to analyze [25]. Aberrant recombination between CYP21A2 and CYP21A1P is often observed in patients with 21-hydroxylase deficiency [25]. Thus, efforts were made to sequence this region using long-read NGS [26,27]. Adachi et al. enabled mutation screening of CYP21A2 through longread NGS for PCR-amplified DNA fragments [27]. The authors proposed that the cost of this analysis can be reduced by using a barcode system.
Long-read NGS can be used to analyze DNA modification [6,7,9]. In particular, ONT-based NGS can analyze DNA methylation without bisulfite treatment of DNA samples. Information on DNA methylation is critical for molecular diagnosis of imprinting disorders, which explain a certain percentage of neonatal diabetes, pseudohypoparathyroidism, pubertal disorders, and short stature in individuals considered small for their gestational age [28]. Yamada et al. [29] performed DNAmethylation analyses using targeted ONT-based NGS and successfully diagnosed patients with Prader-Willi syndrome or Angelman syndrome. Furthermore, ONT-based NGS is useful in the genome-wide screening for DNA-methylation changes associated with disorders (epi-variants).
Another advantage of long-read NGS over short-read NGS and array-based CGH is its capability for haplotype phasing [9]. Haplotype phasing is a method to differentiate 2 alleles inherited from the mother and the father [9]. Haplotype phasing provides critical information on compound heterozygosity of mutations associated with autosomal recessive diseases; this method can determine whether 2 variants in a gene are located on the same or different alleles. Furthermore, it is useful to determine whether a de novo mutation is located on the maternally or paternally derived allele. This information helps to predict the consequences of variants of imprinted genes. The combination of haplotype phasing with DNA-methylation analysis enables researchers to determine epigenetic abnormalities on each allele. In addition, haplotype phasing can be used to clarify the parental origin of each DNA fragment of complex rearrangements and to examine the clonality of multiple mosaic variants in tumor tissues [9].
Long-read NGS has some limitations, which include a relatively high error rate and cost [6,7,9]. However, the accuracy of sequence results is continuously improving, and technical advances are helping to reduce the cost per base [9,10]. It is expected that long-read NGS will become a major platform for molecular diagnosis of congenital disorders. In this context, ONT-based NGS is characterized by rapid workflow [7]. A pilot study at Stanford University showed that ultra-rapid ONT-based NGS was able to diagnose rare genetic diseases of critically ill patients within an average of 8 hours [7]. In addition, long-read NGS has several further applications in research and clinical settings, such as cancer genetics and infection surveillance [7,9], although this is beyond the scope of the present article.
Notes
References
1. Eichler EE. Genetic variation, comparative genomics, and the diagnosis of disease. N Engl J Med. 2019; 381:64–74.
2. Fukami M, Miyado M. Next generation sequencing and array-based comparative genomic hybridization for molecular diagnosis of pediatric endocrine disorders. Ann Pediatr Endocrinol Metab. 2017; 22:90–4.
3. Lee C, Iafrate AJ, Brothman AR. Copy number variations and clinical cytogenetic diagnosis of constitutional disorders. Nat Genet. 2007; 39(7 Suppl):S48–54.
4. Izumi Y, Suzuki E, Kanzaki S, Yatsuga S, Kinjo S, Igarashi M, et al. Genome-wide copy number analysis and systematic mutation screening in 58 patients with hypogonadotropic hypogonadism. Fertil Steril. 2014; 102:1130–6.
5. Suga K, Imoto I, Ito H, Naruto T, Goji A, Osumi K, et al. Next-generation sequencing for the diagnosis of patients with congenital multiple anomalies and/or intellectual disabilities. J Med Invest. 2020; 67:246–9.
6. Logsdon GA, Vollger MR, Eichler EE. Long-read human genome sequencing and its applications. Nat Rev Genet. 2020; 21:597–614.
7. Oehler JB, Wright H, Stark Z, Mallett AJ, Schmitz U. The application of long-read sequencing in clinical settings. Hum Genomics. 2023; 17:73.
8. Miller DE, Sulovari A, Wang T, Loucks H, Hoekzema K, Munson KM, et al. Targeted long-read sequencing identifies missing disease-causing variation. Am J Hum Genet. 2021; 108:1436–49.
9. Conlin LK, Aref-Eshghi E, McEldrew DA, Luo M, Rajagopalan R. Long-read sequencing for molecular diagnostics in constitutional genetic disorders. Hum Mutat. 2022; 43:1531–44.
10. Mastrorosa FK, Miller DE, Eichler EE. Applications of longread sequencing to Mendelian genetics. Genome Med. 2023; 15:42.
11. Sharp AJ, Cheng Z, Eichler EE. Structural variation of the human genome. Annu Rev Genomics Hum Genet. 2006; 7:407–42.
12. Stankiewicz P, Lupski JR. The genomic basis of disease, mechanisms and assays for genomic disorders. Genome Dyn. 2006; 1:1–16.
13. Miller DE, Hanna P, Galey M, Reyes M, Linglart A, Eichler EE, et al. Targeted long-read sequencing identifies a retrotransposon insertion as a cause of altered GNAS exon A/B methylation in a family with autosomal dominant pseudohypoparathyroidism type 1b (PHP1B). J Bone Miner Res. 2022; 37:1711–9.
14. Taşkesen M, Collin GB, Evsikov AV, Güzel A, Özgül RK, Marshall JD, et al. Novel Alu retrotransposon insertion leading to Alström syndrome. Hum Genet. 2012; 131:407–13.
15. Del Gobbo GF, Wang X, Couse M, Mackay L, Goldsmith C, Marshall AE, et al. Long-read genome sequencing reveals a novel intronic retroelement insertion in NR5A1 associated with 46,XY differences of sexual development. Am J Med Genet A. 2024; 194:e63522.
16. Pfaff AL, Singleton LM, Kõks S. Mechanisms of diseaseassociated SINE-VNTR-Alus. Exp Biol Med (Maywood). 2022; 247:756–64.
17. Stevanovski I, Chintalaphani SR, Gamaarachchi H, Ferguson JM, Pineda SS, Scriba CK, et al. Comprehensive genetic diagnosis of tandem repeat expansion disorders with programmable targeted nanopore sequencing. Sci Adv. 2022; 8:e. abm5386.
18. Kekou K, Sofocleous C, Papadimas G, Petichakis D, Svingou M, Pons RM, et al. A dynamic trinucleotide repeat (TNR) expansion in the DMD gene. Mol Cell Probes. 2016; 30:254–60.
19. Miyatake S, Koshimizu E, Fujita A, Doi H, Okubo M, Wada T, et al. Rapid and comprehensive diagnostic method for repeat expansion diseases using nanopore sequencing. NPJ Genom Med. 2022; 7:62.
20. Liu P, Erez A, Nagamani SC, Dhar SU, Kołodziejska KE, Dharmadhikari AV, et al. Chromosome catastrophes involve replication mechanisms generating complex genomic rearrangements. Cell. 2011; 146:889–903.
21. Kloosterman WP, Guryev V, van Roosmalen M, Duran KJ, de Bruijn E, Bakker SC, et al. Chromothripsis as a mechanism driving complex de novo structural rearrangements in the germline. Hum Mol Genet. 2011; 20:1916–24.
22. Hattori A, Fukami M. Established and novel mechanisms leading to de novo genomic rearrangements in the human germline. Cytogenet Genome Res. 2020; 160:167–76.
23. Lei M, Liang D, Yang Y, Mitsuhashi S, Katoh K, Miyake N, et al. Long-read DNA sequencing fully characterized chromothripsis in a patient with Langer-Giedion syndrome and Cornelia de Lange syndrome-4. J Hum Genet. 2020; 65:667–74.
24. Mantere T, Kersten S, Hoischen A. Long-read sequencing emerging in medical genetics. Front Genet. 2019; 10:426.
25. Stephens Z, Milosevic D, Kipp B, Grebe S, Iyer RK, Kocher JA. PB-Motif-A method for identifying gene/pseudogene rearrangements with long reads: an application to CYP21A2 genotyping. Front Genet. 2021; 12:716586.
26. Zhang R, Cui D, Song C, Ma X, Cai N, Zhang Y, et al. Evaluating the efficacy of a long-read sequencingbased approach in the clinical diagnosis of neonatal congenital adrenocortical hyperplasia. Clin Chim Acta. 2024; 555:117820.
27. Adachi E, Nakagawa R, Tsuji-Hosokawa A, Gau M, Kirino S, Yogi A, et al. A MinION-based long-read sequencing application with one-step PCR for the genetic diagnosis of 21-hydroxylase deficiency. J Clin Endocrinol Metab. 2024; 109:750–60.