2017
DOI: 10.1093/bioinformatics/btx547
|View full text |Cite
|
Sign up to set email alerts
|

Snaptron: querying splicing patterns across tens of thousands of RNA-seq samples

Abstract: MotivationAs more and larger genomics studies appear, there is a growing need for comprehensive and queryable cross-study summaries. These enable researchers to leverage vast datasets that would otherwise be difficult to obtain.ResultsSnaptron is a search engine for summarized RNA sequencing data with a query planner that leverages R-tree, B-tree and inverted indexing strategies to rapidly execute queries over 146 million exon-exon splice junctions from over 70 000 human RNA-seq samples. Queries can be tailore… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
69
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
1

Relationship

3
3

Authors

Journals

citations
Cited by 41 publications
(71 citation statements)
references
References 14 publications
(20 reference statements)
2
69
0
Order By: Relevance
“…Of the 30 362 alternate splice junctions identified, 24% were partially novel resulting from alternate splice acceptor or donor sites while 8% were completely novel as shown in Figure D. All novel splice junctions were queried using Snaptron . Specifically, novel junctions were used to query Snaptron's SRAv2, GTEx, and TCGA compilations, which together index over 70 000 human RNA‐seq runs .…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…Of the 30 362 alternate splice junctions identified, 24% were partially novel resulting from alternate splice acceptor or donor sites while 8% were completely novel as shown in Figure D. All novel splice junctions were queried using Snaptron . Specifically, novel junctions were used to query Snaptron's SRAv2, GTEx, and TCGA compilations, which together index over 70 000 human RNA‐seq runs .…”
Section: Resultsmentioning
confidence: 99%
“…Exon–exon splice junctions from the alignment data were detected and annotated using “junction_annotation.py” tool of RSeQC v2.6.4 . All novel category of splice junctions were queried using Snaptron to detect their occurrence and coverage details in all samples across the recount2 resource. recount2 summarizes splice‐junction evidence from 50 099 human run accessions in the Sequence Read Archive (SRA), 9662 accessions from 551 individuals in the GTEx project and 11 350 accessions from 10 340 individuals in the cancer genome atlas (TCGA) project.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…For the non-cancer samples, the term "cancer" was explicitly added as an excluded ontology term in the query, and the resulting files were filtered to remove any samples with "tumor" in the sample_name field. The resulting accession numbers were queried against the Snaptron junction database using the query snaptron tool (Wilks et al, 2018), yielding junctions for the tissue and cell types of interest that were downloaded. Patient somatic mutation calls were downloaded from the GDAC firehose (Broad Institute TCGA Genome Data Analysis Center, 2016), while a list of human splicing-associated gene mutations (keyword search "mRNA splicing [KW-0508]") was downloaded from the UniProt database (UniProt Consortium, 2019).…”
Section: Methods Details Data Downloadmentioning
confidence: 99%
“…(I) Comparison of TCGA-cohort prevalence of junctions occurring vs. not occurring in SRA cancer samples: log-scale box plots representing, for selected TCGA cancer types, the prevalences within each cancer-type cohort of junctions occurring in at least 1% of cancer-type samples, separated into prevalences for junctions (orange, left) found or (blue, right) not found in type-matched cancer sample(s) from the SRA. Selected TCGA cancer types are those for which cancer-matched SRA sample junctions are available from Snaptron (Wilks et al, 2018) and at least 50 TCGA cancer junctions not found in core normal samples are present in the cancer-type matched SRA samples. Most junctions are TCGA-specific, but junctions that are also found in an type-matched SRA cancer cohort have on average higher TCGA-cohort prevalences.…”
Section: Supplemental Informationmentioning
confidence: 99%