2016
DOI: 10.1186/s12859-016-0934-8
|View full text |Cite
|
Sign up to set email alerts
|

GEMINI: a computationally-efficient search engine for large gene expression datasets

Abstract: BackgroundLow-cost DNA sequencing allows organizations to accumulate massive amounts of genomic data and use that data to answer a diverse range of research questions. Presently, users must search for relevant genomic data using a keyword, accession number of meta-data tag. However, in this search paradigm the form of the query – a text-based string – is mismatched with the form of the target – a genomic profile.ResultsTo improve access to massive genomic data resources, we have developed a fast search engine,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 13 publications
(5 citation statements)
references
References 16 publications
0
5
0
Order By: Relevance
“…VEP (using RefSeq and Ensembl 91) was used to annotate the variants. We used the GEMINI 43 framework that automatically integrates the VCF file into a database for exploring genetic variant for disease and population genetics. Genetic variants were analyzed using GRAVITY, a Cytoscape plugin that we designed for visualizing WES results using Protein-Protein Interaction networks (http://gravity.pasteur.fr/).…”
Section: Methodsmentioning
confidence: 99%
“…VEP (using RefSeq and Ensembl 91) was used to annotate the variants. We used the GEMINI 43 framework that automatically integrates the VCF file into a database for exploring genetic variant for disease and population genetics. Genetic variants were analyzed using GRAVITY, a Cytoscape plugin that we designed for visualizing WES results using Protein-Protein Interaction networks (http://gravity.pasteur.fr/).…”
Section: Methodsmentioning
confidence: 99%
“…• Vantage-point: a point which is selected from dataset, • Radius: a distance defining the range of vantage-point, • Left-hand side: the left subtree including the data points which are smaller than or equal to the radius of a vantagepoint and, • Right-hand side: the right subtree including the data points that are greater than the radius of a vantage-point. The main steps of a VP-tree construction are presented as below [46], [51][52][53][54][55]; 1) choose a vantage-point, 2) calculate the distances between the vantage-point and the others, 3) find the median of these distances, 4) accept the median as a splitting value, 5) according to the splitting value, partition data space into two subspaces, 6) go to step 1 until no data point is left.…”
Section: B Kd-tree and Vp-treementioning
confidence: 99%
“…Annotating and filtering large numbers of variant alleles require specialty software. Existing annotators, such as ANNOVAR [ 1 ], SeqAnt [ 2 ], VEP [ 3 ], and GEMINI [ 4 ] have played an important research role, and are sufficient for small to medium experiments (e.g., read 10s to 100s of WES samples). However, they require significant computer science training to use in offline, distributed computing environments and have substantial restrictions in terms of performance and the maximum size of the data they will annotate online.…”
Section: Introductionmentioning
confidence: 99%