2014
DOI: 10.1038/srep06957
|View full text |Cite
|
Sign up to set email alerts
|

Assessment of quality control approaches for metagenomic data analysis

Abstract: Currently there is an explosive increase of the next-generation sequencing (NGS) projects and related datasets, which have to be processed by Quality Control (QC) procedures before they could be utilized for omics analysis. QC procedure usually includes identification and filtration of sequencing artifacts such as low-quality reads and contaminating reads, which would significantly affect and sometimes mislead downstream analysis. Quality control of NGS data for microbial communities is especially challenging.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

4
21
0
2

Year Published

2015
2015
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 48 publications
(27 citation statements)
references
References 14 publications
(17 reference statements)
4
21
0
2
Order By: Relevance
“…Therefore, we generated five metagenomics samples simulating a human oral community with 13 bacterial species, and including a variable amount of human reads as contaminants. The relative proportion of the bacterial genomes followed that suggested by Zhou et al [48] ( Supplementary Table S1). The percentage of contaminant reads was of 1, 5, 25, 50, and 80%, in line with the amounts observed in the literature for human samples [26,49,50].…”
Section: Simulation Studysupporting
confidence: 69%
“…Therefore, we generated five metagenomics samples simulating a human oral community with 13 bacterial species, and including a variable amount of human reads as contaminants. The relative proportion of the bacterial genomes followed that suggested by Zhou et al [48] ( Supplementary Table S1). The percentage of contaminant reads was of 1, 5, 25, 50, and 80%, in line with the amounts observed in the literature for human samples [26,49,50].…”
Section: Simulation Studysupporting
confidence: 69%
“…The pipeline has three common stages, quality control, read trimming, and a host screen. Together these make up the preprocessing stages which increase the power of downstream analyses [43]. This starts by using FastQC [36] to calculate general quality control (QC) metrics, such as per base qualities, followed by Trimmomatic [37], which removes poor quality base calls and potential adapter sequences, and finally host removal by reference-based mapping using the Bowtie2 [44] aligner against a panel of expected host sequences, such as human or porcine.…”
Section: Analysis Pipelinementioning
confidence: 99%
“…Work initiated in the area of standardizing laboratory procedures includes much-needed development of reference materials and improved DNA extraction methods, as exemplified by establishment of the International Metagenomics and Microbiome Standards Alliance (IMMSA) (NIST, 2016). Controls used to assess artifacts and bias introduced by sample preparation, PCR, and sequencing are reviewed elsewhere (Pinto and Raskin, 2012;Elbrecht and Leese, 2015;Pedersen et al, 2015;Tan et al, 2015;Zhou et al, 2015;Aylagas et al, 2016;Danovaro et al, 2016) and should be used in conjunction with standard quality control pipelines for sequence quality, including chimeric removal (Smyth et al, 2010;Teeling and Glöckner, 2012;Zhou et al, 2014;Escobar-Zepeda et al, 2015;Jeon et al, 2015). Including mock communities in DNA sequencing efforts can assess technical issues such as incomplete DNA extraction or library preparation, PCR, and sequencing errors (Schirmer et al, 2015) and provide correction factors (Tan et al, 2015;Aylagas et al, 2016).…”
Section: Controls and Replicationmentioning
confidence: 99%