Shane Wilson scite author profile

BackgroundIt is now well established that nearly 20% of human cancers are caused by infectious agents, and the list of human oncogenic pathogens will grow in the future for a variety of cancer types. Whole tumor transcriptome and genome sequencing by next-generation sequencing technologies presents an unparalleled opportunity for pathogen detection and discovery in human tissues but requires development of new genome-wide bioinformatics tools.ResultsHere we present CaPSID (Computational Pathogen Sequence IDentification), a comprehensive bioinformatics platform for identifying, querying and visualizing both exogenous and endogenous pathogen nucleotide sequences in tumor genomes and transcriptomes. CaPSID includes a scalable, high performance database for data storage and a web application that integrates the genome browser JBrowse. CaPSID also provides useful metrics for sequence analysis of pre-aligned BAM files, such as gene and genome coverage, and is optimized to run efficiently on multiprocessor computers with low memory usage.ConclusionsTo demonstrate the usefulness and efficiency of CaPSID, we carried out a comprehensive analysis of both a simulated dataset and transcriptome samples from ovarian cancer. CaPSID correctly identified all of the human and pathogen sequences in the simulated dataset, while in the ovarian dataset CaPSID’s predictions were successfully validated in vitro.

show abstract

Developing Cancer Informatics Applications and Tools Using the NCI Genomic Data Commons API

Wilson

Fitzsimons

Ferguson

et al. 2017

View full text Add to dashboard Cite

The NCI Genomic Data Commons (GDC) was launched in 2016 and makes available over 2 petabytes (PB) of cancer genomic and associated clinical data to the research community. This dataset continues to grow and currently includes over 14,500 patients. The GDC is an example of a biomedical data commons, which collocates biomedical data with storage and computing infrastructure and commonly used web services, software applications, and tools to create a secure, interoperable, and extensible resource for researchers. The GDC is: i) a data repository for downloading data that has been submitted to it, and also a system that: ii) applies a common set of bioinformatics pipelines to submitted data; iii) re-analyzes existing data when new pipelines are developed; and, iv) allows users to build their own applications and systems that interoperate with the GDC using the GDC Application Programming Interface (API). We describe the GDC API and how it has been used both by the GDC itself and by third parties.

show abstract

Author Correction: The NCI Genomic Data Commons

et al. 2021

View full text Add to dashboard Cite

Mapping and Sequencing the Human Genome.

Wilson¹

1989

Biometrics

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Shane Wilson

The NCI Genomic Data Commons

CaPSID: A bioinformatics platform for computational pathogen sequence identification in human genomes and transcriptomes

Developing Cancer Informatics Applications and Tools Using the NCI Genomic Data Commons API

Author Correction: The NCI Genomic Data Commons

Mapping and Sequencing the Human Genome.

Contact Info

Product

Resources

About