William C Koehler scite author profile

Vand²,

Koehler³

et al. 2018

Preprint

Cardiovascular disease (CVD) is the leading cause of death worldwide, causing over 17M deaths per year, which outpaces global cancer mortality rates. Despite these sobering statistics, the state-of-the-art in computational infrastructure to study datasets associated with CVD has lagged far behind public resources widely available in the oncology field, where improved data science and visualization methods have led to the development of large-scale cancer genomics resources like MSKCC's cBioPortal or NCI's Genomic Data Commons (GDC) Portal. Developing a similar user-friendly computational platform could significantly lower the barriers between complex CVD data and researchers who want rapid, intuitive, and high-quality visual access to molecular profiles and clinical attributes from existing CVD projects. Here we present HeartBioPortal: a publicly available web application that provides intuitive visualization, analysis, and downloads of large-scale CVD data currently focused on gene expression, genetic association, and ancestry information. By democratizing access to anonymized CVD data, HeartBioPortal's aim is to integrate relevant omics and clinical information across the biological dataverse to support CVD clinicians and researchers.

HeartBioPortal

Circ: Genomic and Precision Medicine

Vand²,

Koehler³

et al. 2019

Cardiovascular disease (CVD) is the leading cause of death worldwide, responsible for over 17 million deaths annually, a rate which outpaces even that related to cancer. Despite these sobering statistics, the state-of-the-art in computational infrastructure for the study of contemporary datasets related to CVD lags substantially behind that widely available in oncology, where improved data science and visualization methods have delivered publicly available comprehensive cancer genomics resources like Memorial Sloan Kettering Cancer Center's cBioPortal 1,2 and the National Cancer Institute's Genomic Data Commons (GDC) Portal 3,4. In our view, such portals do an outstanding job of transforming data from The Cancer Genome Atlas (TCGA) into logical data visualizations that provide additional biological insight. Developing a similar user-friendly computational platform for CVD could significantly lower the barriers of discovery by providing researchers with rapid, intuitive,

Optimized functional annotation of ChIP-seq data

Koehler²,

Booven

et al. 2016

Preprint

Motivation: Different ChIP-seq peak callers often produce different output results from the same input. Since different peak callers are known to produce differentially enriched peaks with a large variance in peak length distribution and total peak count, accurately annotating peak lists with their nearest genes can be an arduous process. Functional genomic annotation of histone modification ChIP-seq data can be a particularly challenging task, as chromatin marks that have inherently broad peaks with a diffuse range of signal enrichment (e.g., H3K9me1, H3K27me3) differ significantly from narrow peaks that exhibit a compact and localized enrichment pattern (e.g., H3K4me3, H3K9ac). In addition, varying degrees of tissue-dependent broadness of an epigenetic mark can make it difficult to accurately and reliably link sequencing data to biological function. Thus, it would be useful to develop a software program that can precisely tailor the computational analysis of a ChIP-seq dataset to the specific peak coordinates of the data. Results: geneXtendeR is an R/Bioconductor package that optimizes the functional annotation of ChIPseq peaks using fast iterative peak-coordinate/GTF alignment algorithms focused on cis-regulatory regions and proximal-promoter regions of nearest genes. The goal of geneXtendeR is to robustly link differentially enriched peaks with their respective genes, thereby aiding experimental follow-up and validation in designing primers for a set of prospective gene candidates during qPCR. We have tested geneXtendeR on 547 human transcription factor ChIP-seq ENCODE datasets and 214 human histone modification ChIPseq ENCODE datasets, providing the analysis results as case studies. Availability: The geneXtendeR R/Bioconductor package (including detailed introductory vignettes) is available under the GPL-3 Open Source license and is freely available to download from Bioconductor at: https://bioconductor.org/packages/devel/geneXtendeR/.

Optimized functional annotation of ChIP-seq data

Koehler²,

Booven

et al. 2019

F1000Res

Different ChIP-seq peak callers often produce different output results from the same input. Since different peak callers are known to produce differentially enriched peaks with a large variance in peak length distribution and total peak count, accurately annotating peak lists with their nearest genes can be an arduous process. Functional genomic annotation of histone modification ChIP-seq data can be a particularly challenging task, as chromatin marks that have inherently broad peaks with a diffuse range of signal enrichment (e.g., H3K9me1, H3K27me3) differ significantly from narrow peaks that exhibit a compact and localized enrichment pattern (e.g., H3K4me3, H3K9ac). In addition, varying degrees of tissue-dependent broadness of an epigenetic mark can make it difficult to accurately and reliably link sequencing data to biological function. Thus, there exists an unmet need to develop a software program that can precisely tailor the computational analysis of a ChIP-seq dataset to the specific peak coordinates of the data and its surrounding genomic features. geneXtendeR optimizes the functional annotation of ChIP-seq peaks by exploring relative differences in annotating ChIP-seq peak sets to variable-length gene bodies. In contrast to prior techniques, geneXtendeR considers peak annotations beyond just the closest gene, allowing users to investigate peak summary statistics for the first-closest gene, second-closest gene, ..., nth-closest gene whilst ranking the output according to biologically relevant events and iteratively comparing the fidelity of peak-to-gene overlap across a user-defined range of upstream and downstream extensions on the original boundaries of each gene's coordinates. We tested geneXtendeR on 547 human transcription factor ChIP-seq ENCODE datasets and 198 human histone modification ChIP-seq ENCODE datasets, providing the analysis results as case studies. The geneXtendeR R/Bioconductor package (including detailed introductory vignettes) is available under the GPL-3 Open Source license and is freely available to download from Bioconductor at: https://bioconductor.org/packages/geneXtendeR/