GEMINI: a computationally-efficient search engine for large gene expression datasets

DeFreitas, Timothy; Saddiki, Hachem; Flaherty, Patrick

doi:10.1186/s12859-016-0934-8

Cited by 13 publications

(5 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…VEP (using RefSeq and Ensembl 91) was used to annotate the variants. We used the GEMINI 43 framework that automatically integrates the VCF file into a database for exploring genetic variant for disease and population genetics. Genetic variants were analyzed using GRAVITY, a Cytoscape plugin that we designed for visualizing WES results using Protein-Protein Interaction networks (http://gravity.pasteur.fr/).…”

Section: Methodsmentioning

confidence: 99%

Both rare and common genetic variants contribute to autism in the Faroe Islands

et al. 2019

View full text Add to dashboard Cite

The number of genes associated with autism is increasing, but few studies have been performed on epidemiological cohorts and in isolated populations. Here, we investigated 357 individuals from the Faroe Islands including 36 individuals with autism, 136 of their relatives and 185 non-autism controls. Data from SNP array and whole exome sequencing revealed that individuals with autism had a higher burden of rare exonic copy-number variants altering autism associated genes (deletions (p = 0.0352) or duplications (p = 0.0352)), higher inbreeding status (p = 0.023) and a higher load of rare homozygous deleterious variants (p = 0.011) compared to controls. Our analysis supports the role of several genes/loci associated with autism (e.g., NRXN1, ADNP, 22q11 deletion) and identified new truncating (e.g., GRIK2, ROBO1, NINL, and IMMP2L) or recessive deleterious variants (e.g., KIRREL3 and CNTNAP2) affecting autism-associated genes. It also revealed three genes involved in synaptic plasticity, RIMS4, KALRN, and PLA2G4A, carrying de novo deleterious variants in individuals with autism without intellectual disability. In summary, our analysis provides a better understanding of the genetic architecture of autism in isolated populations by highlighting the role of both common and rare gene variants and pointing at new autism-risk genes. It also indicates that more knowledge about how multiple genetic hits affect neuronal function will be necessary to fully understand the genetic architecture of autism.

show abstract

Section: Methodsmentioning

confidence: 99%

Both rare and common genetic variants contribute to autism in the Faroe Islands

et al. 2019

View full text Add to dashboard Cite

show abstract

“…• Vantage-point: a point which is selected from dataset, • Radius: a distance defining the range of vantage-point, • Left-hand side: the left subtree including the data points which are smaller than or equal to the radius of a vantagepoint and, • Right-hand side: the right subtree including the data points that are greater than the radius of a vantage-point. The main steps of a VP-tree construction are presented as below [46], [51][52][53][54][55]; 1) choose a vantage-point, 2) calculate the distances between the vantage-point and the others, 3) find the median of these distances, 4) accept the median as a splitting value, 5) according to the splitting value, partition data space into two subspaces, 6) go to step 1 until no data point is left.…”

Section: B Kd-tree and Vp-treementioning

confidence: 99%

A New Anonymization Model for Privacy Preserving Data Publishing: CANON

Canbay

Sağıroğlu

Vural

2022

Balkan Journal of Electrical and Computer Engineering

View full text Add to dashboard Cite

Data privacy is a challenging trade-off problem between privacy preserving and data utility. Anonymization is a fundamental approach for privacy preserving and also a hard trade-off problem. It enables to hide the identities of data subjects or record owners and requires to be developed near-optimal solutions. In this paper, a new multidimensional anonymization model (CANON) that employs vantage-point tree (VPtree) and multidimensional generalization for greedy partitioning and anonymization, respectively, is proposed and introduced successfully for the first time. The main concept of CANON is inspired from Mondrian, which is an anonymization model for privacy preserving data publishing. Experimental results have shown that CANON takes data distribution into consideration and creates equivalence classes including closer data points than Mondrian. As a result, CANON provides better data utility than Mondrian in terms of GCP metric and it is a promising anonymization model for future works.

show abstract

“…Annotating and filtering large numbers of variant alleles require specialty software. Existing annotators, such as ANNOVAR [ 1 ], SeqAnt [ 2 ], VEP [ 3 ], and GEMINI [ 4 ] have played an important research role, and are sufficient for small to medium experiments (e.g., read 10s to 100s of WES samples). However, they require significant computer science training to use in offline, distributed computing environments and have substantial restrictions in terms of performance and the maximum size of the data they will annotate online.…”

Section: Introductionmentioning

confidence: 99%

Bystro: rapid online variant annotation and natural-language filtering at whole-genome scale

et al. 2018

View full text Add to dashboard Cite

Accurately selecting relevant alleles in large sequencing experiments remains technically challenging. Bystro (https://bystro.io/) is the first online, cloud-based application that makes variant annotation and filtering accessible to all researchers for terabyte-sized whole-genome experiments containing thousands of samples. Its key innovation is a general-purpose, natural-language search engine that enables users to identify and export alleles and samples of interest in milliseconds. The search engine dramatically simplifies complex filtering tasks that previously required programming experience or specialty command-line programs. Critically, Bystro’s annotation and filtering capabilities are orders of magnitude faster than previous solutions, saving weeks of processing time for large experiments.Electronic supplementary materialThe online version of this article (10.1186/s13059-018-1387-3) contains supplementary material, which is available to authorized users.

show abstract

GEMINI: a computationally-efficient search engine for large gene expression datasets

Cited by 13 publications

References 16 publications

Both rare and common genetic variants contribute to autism in the Faroe Islands

Both rare and common genetic variants contribute to autism in the Faroe Islands

A New Anonymization Model for Privacy Preserving Data Publishing: CANON

Bystro: rapid online variant annotation and natural-language filtering at whole-genome scale

Contact Info

Product

Resources

About