Comprehensive and accurate detection of variants from whole-genome sequencing (WGS) is a strong prerequisite for translational genomic medicine; however, low concordance between analytic pipelines is an outstanding challenge. We processed a European and an African WGS samples with 70 analytic pipelines comprising the combination of 7 short-read aligners and 10 variant calling algorithms (VCAs), and observed remarkable differences in the number of variants called by different pipelines (max/min ratio: 1.3~3.4). The similarity between variant call sets was more closely determined by VCAs rather than by short-read aligners. Remarkably, reported minor allele frequency had a substantial effect on concordance between pipelines (concordance rate ratio: 0.11~0.92; Wald tests,
P
< 0.001), entailing more discordant results for rare and novel variants. We compared the performance of analytic pipelines and pipeline ensembles using gold-standard variant call sets and the catalog of variants from the 1000 Genomes Project. Notably, a single pipeline using BWA-MEM and GATK-HaplotypeCaller performed comparable to the pipeline ensembles for ‘callable’ regions (~97%) of the human reference genome. While a single pipeline is capable of analyzing common variants in most genomic regions, our findings demonstrated the limitations and challenges in analyzing rare or novel variants, especially for non-European genomes.
SummaryHigh-throughput screening of the host transcriptional response to various viral infections provides a wealth of data, but utilization of microarray and next generation sequencing (NGS) data for analysis can be difficult. The Host Transcriptional Response DataBase (HoTResDB), allows visitors to access already processed microarray and NGS data from non-human primate models of viral hemorrhagic fever to better understand the host transcriptional response.AvailabilityHoTResDB is freely available at http://hotresdb.bu.edu
Genome sequencing is positioned as a routine clinical work-up for diverse clinical conditions. A commonly used approach to highlight candidate variants with potential clinical implication is to search over locus-and gene-centric knowledge databases.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.