Abstract
Purpose
Identifying microbial communities with 16S ribosomal RNA (rRNA) gene sequencing is a popular approach in microbiome studies, and various software tools and data resources have been developed for microbial analysis. Our aim in this study is investigating various available software tools and reference sequence databases to compare their performance in differentiating subject samples and negative controls.
Methods
We collected 4 negative control samples using various acquisition protocols, and 2 respiratory samples were acquired from a healthy subject also with different acquisition protocols. Quantitative methods were used to compare the results of taxonomy compositions of these 6 samples by varying the configuration of analysis software tools and reference databases.
Results
The results of taxonomy assignments showed relatively little difference, regardless of pipeline configurations and reference databases. Nevertheless, the effect on the discrepancy was larger using different software configurations than using different reference databases. In recognizing different samples, the 4 negative controls were clearly separable from the 2 subject samples. Addi-tionally, there is a tendency to differentiate samples from different acquisition protocols.
Conclusion
Our results suggest little difference in microbial compositions between different software tools and reference databases, but certain configurations can improve the separability of samples. Changing software tools shows a greater impact on results than changing reference databases; thus, it is necessary to utilize appropriate configurations based on the objectives of studies.
REFERENCES
1. Staley JT, Konopka A. Measurement of in situ activities of nonphotosyn-thetic microorganisms in aquatic and terrestrial habitats. Annu Rev Microbiol. 1985; 39:321–46.
2. Zoetendal EG, Collier CT, Koike S, Mackie RI, Gaskins HR. Molecular ecological analysis of the gastrointestinal microbiota: a review. J Nutr. 2004; 134:465–72.
3. NIH HMP Working Group. Peterson J, Garges S, Giovanni M, McInnes P, Wang L, et al. The NIH Human Microbiome Project. Genome Res. 2009; 19:2317–23.
4. Ley RE, Peterson DA, Gordon JI. Ecological and evolutionary forces shap-ing microbial diversity in the human intestine. Cell. 2006; 124:837–48.
5. Clemente JC, Ursell LK, Parfrey LW, Knight R. The impact of the gut microbiota on human health: an integrative view. Cell. 2012; 148:1258–70.
6. Lederberg J, McCray AT. ‘Ome sweet’ omics: a genealogical treasury of words. Scientist. 2001; 15:8–10.
7. Grice EA, Segre JA. The human microbiome: our second genome. Annu Rev Genomics Hum Genet. 2012; 13:151–70.
9. Morgan XC, Huttenhower C. Chapter 12: human microbiome analysis. PLoS Comput Biol. 2012; 8:e1002808.
10. Armougom F, Raoult D. Exploring microbial diversity using 16S rRNA high-throughput methods. J Comput Sci Syst Biol. 2009; 2:74–92.
11. Clarridge JE 3rd. Impact of 16S rRNA gene sequence analysis for identification of bacteria on clinical microbiology and infectious diseases. Clin Microbiol Rev. 2004; 17:840–62.
12. Balvociute M, Huson DH. SILVA, RDP, Greengenes, NCBI and OTT - how do these taxonomies compare? BMC Genomics. 2017; 18(Suppl 2):114.
13. Schloss PD. Application of a database-independent approach to assess the quality of operational taxonomic unit picking methods. mSystems. 2016; 1(2):pii: e00027-16.
14. Plummer E, Twin J, Bulach DM, Garland SM, Tabrizi SN. A comparison of three bioinformatics pipelines for the analysis of preterm gut microbiota using 16S rRNA gene sequencing data. J Proteomics Bioinform. 2015; 8:283–91.
15. Masella AP, Bartram AK, Truszkowski JM, Brown DG, Neufeld JD. PAN-DAseq: paired-end assembler for illumina sequences. BMC Bioinformatics. 2012; 13:31.
16. Rognes T, Flouri T, Nichols B, Quince C, Mahe F. VSEARCH: a versatile open source tool for metagenomics. PeerJ. 2016; 4:e2584.
17. Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costel-lo EK, et al. QIIME allows analysis of high-throughput community sequencing data. Nat Methods. 2010; 7:335–6.
18. Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010; 26:2460–1.
19. DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, et al. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol. 2006; 72:5069–72.
20. Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, et al. The SIL-VA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 2013; 41(Database issue):D590–6.
21. Magoc T, Salzberg SL. FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics. 2011; 27:2957–63.
Table 1.
Normal saline and protected brushes which used for sample acquisition were steril-ized. Four negative controls were derived from using multiple sample acquisitions that are normal saline (NC1), immersing protected brush in normal saline (NC2), immersing protected brush which through the bronchoscopy in normal saline (NC3), and washing bronchoscopy channel with normal saline (NC4). Two subject samples were acquired from a healthy subject using protected brush (S1) and bronchial washing (S2).