Mutational escape from vaccine-induced immune responses has thwarted the development of a successful vaccine against AIDS, whose causative agent is HIV, a highly mutable virus. Knowing the virus’ fitness as a function of its proteomic sequence can enable rational design of potent vaccines, as this information can focus vaccine-induced immune responses to target mutational vulnerabilities of the virus. Spin models have been proposed as a means to infer intrinsic fitness landscapes of HIV proteins from patient-derived viral protein sequences. These sequences are the product of nonequilibrium viral evolution driven by patient-specific immune responses and are subject to phylogenetic constraints. How can such sequence data allow inference of intrinsic fitness landscapes? We combined computer simulations and variational theory á la Feynman to show that, in most circumstances, spin models inferred from patient-derived viral sequences reflect the correct rank order of the fitness of mutant viral strains. Our findings are relevant for diverse viruses.
Gene co-expression networks capture biological relationships between genes and are important tools in predicting gene function and understanding disease mechanisms. We show that technical and biological artifacts in gene expression data confound commonly used network reconstruction algorithms. We demonstrate theoretically, in simulation, and empirically, that principal component correction of gene expression measurements prior to network inference can reduce false discoveries. Using data from the GTEx project in multiple tissues, we show that this approach reduces false discoveries beyond correcting only for known confounders.
Electronic supplementary material
The online version of this article (10.1186/s13059-019-1700-9) contains supplementary material, which is available to authorized users.
Sequencing and microarray samples often are collected or processed in multiple batches or at different times. This often produces technical biases that can lead to incorrect results in the downstream analysis. There are several existing batch adjustment tools for ‘-omics’ data, but they do not indicate a priori whether adjustment needs to be conducted or how correction should be applied. We present a software pipeline, BatchQC, which addresses these issues using interactive visualizations and statistics that evaluate the impact of batch effects in a genomic dataset. BatchQC can also apply existing adjustment tools and allow users to evaluate their benefits interactively. We used the BatchQC pipeline on both simulated and real data to demonstrate the effectiveness of this software toolkit.Availability and Implementation: BatchQC is available through Bioconductor: http://bioconductor.org/packages/BatchQC and GitHub: https://github.com/mani2012/BatchQC.Contact:
wej@bu.eduSupplementary information:
Supplementary data are available at Bioinformatics online.
Gene co-expression networks can capture biological relationships between genes, and are important tools in predicting gene function and understanding disease mechanism. We show that artifacts such as batch effects in gene expression data confound commonly used network reconstruction algorithms. We then demonstrate, both theoretically and empirically, that principal component correction of gene expression measurements prior to network inference can reduce false discoveries. Using expression data from the GTEx project in multiple tissues and hundreds of individuals, this approach improves precision and recall in the networks reconstructed.Groups of genes are function together to perform distinct cellular processes, which are often supported by coordinated expression of functionally related genes. . Based on this, gene co-expression networks seek to identify transcriptional patterns that are indicative of functional interactions and regulatory relationships between genes 1-3 . The true in vivo functional interactions between genes are not fully characterized for most species, tissues, and disease-relevant contexts. Therefore reconstruction of co-expression networks from high throughput measurements is of common interest. However, accurate reconstruction of such networks remains a challenging problem.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.