Next generation sequencing and urologic cancer

Seon-Kyu Kim; Wun-Jae Kim

doi:10.4111/kju.2015.56.2.87

Nucleic acid sequencing is a method for identifying the exact nucleotides present in a given DNA or RNA molecule. After completion of the first human genome sequence from the Human Genome Project by first-generation sequencing (also known as Sanger sequencing), the use of nucleic acid sequencing has greatly increased in research and clinical investigations worldwide, with a resultant demand for cheaper and faster sequencing methods. This demand has driven the development of next-generation sequencing (NGS), in which massively parallel sequencing is performed, which allows an entire genome to be sequenced from millions of fragments of DNA samples in less than 1 day. During the last decade, several NGS platforms, such as the Illumina HiSeq series, SOLiD, or 454, have been developed to provide low-cost and high-throughput sequencing. The creation of these NGS platforms has made sequencing accessible to more laboratories and has facilitated the discovery of genes and regulatory elements associated with diseases including urologic cancers.

The application of NGS has gradually broadened, allowing for rapid advances in many fields associated with the biomedical sciences and clinical research. Whole-genome sequencing (also known as full-genome sequencing, complete genome sequencing, or entire genome sequencing) is a laboratory process that determines the complete DNA sequence of a sample's genome at a single time. This entails sequencing of chromosomal DNA as well as DNA contained in the mitochondria. In the future of personalized medicine, whole-genome sequence data will be an important tool for guiding therapeutic interventions. The tool of gene sequencing at the single-nucleotide polymorphism (SNP) level is also used to pinpoint functional DNA variants from association studies and may lay the foundation for predicting disease susceptibility and drug response. Additionally, gene expression studies using RNA-seq (also known as whole-transcriptome sequencing) have begun to replace the use of DNA microarray analysis, providing the ability to visualize RNA expression in sequence form. RNA-seq is an experimental protocol that uses NGS technologies to sequence the RNA molecules within a biological sample in an effort to determine the relative abundance of each RNA.

Although sequencing of the entire human genome by the whole-genome sequencing method is possible, researchers and clinicians are typically interested in only the protein-coding regions of the genome, which is referred to as the exome. The exome makes up just over 1% of the genome. Thus, whole-exome sequencing is much more cost-effective than whole-genome sequencing, because exome sequencing can provide sequence information for protein-coding regions.

These are simply some of the broad applications that skim the surface of what NGS can offer to translational researchers and clinicians. As NGS continues to grow in popularity, it is inevitable that there will be additional innovative applications.

NGS is uncovering human genome diversity at both the germ-line level (population) and the somatic level (tumors). Several cooperative groups such as the International Cancer Genome Consortium and The Cancer Genome Atlas (TCGA) Research Network provide platforms for effective worldwide collaboration in various cancers. Since 2010, whole-genome and whole-exome sequencing data of urological cancers, including kidney cancer, bladder cancer, and prostate cancer (PCa), have become available to the research community. Studies from the Wellcome Trust have shown a high frequency of inactivation of polybromo 1 (PBRM1) in 40% of renal cell carcinomas (RCCs) and recurrent mutations in genes involved in histone methylation: SET domain containing 2 (SETD2), lysine (K)-specific demethylase 5C (KDM5C), and lysine (K)-specific demethylase 6A (KDM6A) [1]. BRCA1-associated protein-1 (BAP1) was found inactivated in 15% of RCCs [2]. PBRM1 and BAP1 mutations are generally exclusive [3], which was confirmed by the TCGA study. SETD2-associated DNA methylation subsets were identified by TCGA, which provided evidence demonstrating that multiple RNA-based tumor subtypes are characterized by mutations in PBRM1 and the mechanistic target of BAP1 [4]. In bladder cancer, whole-exome sequencing has revealed mutations in chromatin modifiers/remodelers, including AT-rich interactive domain 1A (ARID1A), KDM6A, CREB binding protein (CREBBP), E1A, E1A binding protein p300 (EP300), myeloid/lymphoid or mixed-lineage leukemia 1 (MLL1), MLL2, MLL3, and nuclear receptor corepressor 1 (NCOR). Recent studies have highlighted alterations in stromal antigen 2 (STAG2) and other genes involved in chromosome segregation acting as tumor suppressors [5]. The TCGA has also identified mutations in a nuclear factor, thioredoxin interacting protein (TXNIP), involved in the response to oxidative stress in 15% of tumors, which suggests putative therapeutic targets including receptor tyrosine kinases (FGFR3 and ERBB1-3) and nuclear receptors. The completion of the TCGA project and the expansion of other populations will provide a more comprehensive map of aggressive bladder tumors. Initial studies in PCa have focused on broad characterization. The identification of complex multichromosomal genomic rearrangement loops and of recurrent speckle-type POZ protein (SPOP), forkhead box A1 (FOXA1), MLL2, and mediator complex subunit 12 (MED12) mutations has provided new insights into PCa genetics. Subsequent work in PCa has focused on specific disease subgroups. A sophisticated study of pretreatment metastatic PCa revealed mutational profiles comparable to the earlier stage of disease and the functional impact of FOXA1 mutations in androgen responsiveness [6]. Another focused report on 11 early-onset tumors showed the distinct, age-related, mutational profiles as well as sustained dysregulation of androgen signalling [7]. These large-scale projects will answer some fundamental questions about PCa, such as mutation-driving PCa profiles and the mutation-associated patient phenotyping.

To obtain the appropriate biological insights from NGS-based studies, a great need exists to apply rigorous bioinformatics procedures to digest the big data. Once sequencing is completed, raw sequence data must undergo multiple steps of analysis. A generalized data analysis pipeline for NGS data includes preprocessing the data to remove adapter sequences and low-quality reads, mapping of the data to a reference genome or de novo alignment of the sequence reads, and analysis of the compiled sequence. Analysis of the sequence must include a wide variety of bioinformatics assessments, including genetic variant calling for detection of SNPs or indels (i.e., the insertion or deletion of bases), detection of novel genes or regulatory elements, and assessment of transcript expression levels. Analysis can also include identification of both somatic and germ-line mutation events that may contribute to the diagnosis of a disease or genetic condition. Many free web-based tools and software packages are available to perform the bioinformatics analyses, which are necessary to successfully analyze the NGS data.

It is expected that the substantial power of NGS will be revealed at additional levels, including understanding tumor progression, prevention, and early detection. NGS technology is already providing new knowledge about the molecular pathogenesis of cancer. The common players involved in the major urologic cancers have likely been identified; however, the many mutations remain poorly characterized. In the next several years, more therapeutically targetable mutations will be discovered, with a scientific consideration of differences according to tumor subtype or population characteristics (e.g., gender, age, and ethnicity). The recent approval of NGS technology for clinical application by the US Food and Drug Administration sets the stage for a transformation in precision medicine.

Next generation sequencing and urologic cancer

Notes

References