2015
DOI: 10.1002/humu.22727
|View full text |Cite
|
Sign up to set email alerts
|

VariSNP, A Benchmark Database for Variations From dbSNP

Abstract: For development and evaluation of methods for predicting the effects of variations, benchmark datasets are needed. Some previously developed datasets are available for this purpose, but newer and larger benchmark sets for benign variants have largely been missing. VariSNP datasets are selected from dbSNP. These subsets were filtered against disease-related variants in the ClinVar, UniProtKB/Swiss-Prot, and PhenCode databases, to identify neutral or nonpathogenic cases. All variant descriptions include mapping … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
38
0

Year Published

2015
2015
2020
2020

Publication Types

Select...
7

Relationship

5
2

Authors

Journals

citations
Cited by 52 publications
(38 citation statements)
references
References 19 publications
(22 reference statements)
0
38
0
Order By: Relevance
“…Usually, the datasets have been different but recently benchmark datasets of sufficient size have become available and are being used increasingly. The VariBench database [Nair and Vihinen, 2013] contains positive and negative datasets for several types and effects of variants, whereas the VariSNP database contains only benign variants extracted from dbSNP [Schaafsma and Vihinen, 2015].…”
Section: Predictors For Amino Acid Substitutionsmentioning
confidence: 99%
“…Usually, the datasets have been different but recently benchmark datasets of sufficient size have become available and are being used increasingly. The VariBench database [Nair and Vihinen, 2013] contains positive and negative datasets for several types and effects of variants, whereas the VariSNP database contains only benign variants extracted from dbSNP [Schaafsma and Vihinen, 2015].…”
Section: Predictors For Amino Acid Substitutionsmentioning
confidence: 99%
“…Genetic differences can be manifested at different levels as a Single Nucleotide Polymorphism (SNPs), which is a genetic change of single nucleotide or as non‐synonymous SNP (nsSNP), which results in amino acid change in the corresponding transcribed product. In this work we focus on substitutions of single amino acid in the corresponding protein and following the literature such a change is termed single amino acid variation (SAV) . The SAV can affect the corresponding protein's function and thus may be associated with human diseases .…”
Section: Introductionmentioning
confidence: 99%
“…We have collected benchmark variation datasets for various prediction tasks to VariBench (Nair & Vihinen, ). VariSNP is another benchmark database containing neutral variants from dbSNP after filtering out disease‐causing and cancer variants (Schaafsma & Vihinen, ). When using benchmark datasets for testing the performance of machine learning (ML) tools, the training and test datasets have to be disjoint (Vihinen, ; Walsh, Pollastri, & Tosatto, ).…”
Section: Performance Assessment and Measuresmentioning
confidence: 99%
“…Our group has a long experience and interest in investigating variants and their effects and includes protein engineering experiments to improve enzyme properties (Nera, Brockmann, Vihinen, Smith, & Mattsson, ; Rasila, Vihinen, Paulin, Haapa‐Paananen, & Savilahti, ; Vihinen et al., ; Vihinen & Mäntsälä, ; Vihinen, Helin, & Mäntsälä, ; Vihinen, Peltonen, Iitia, Suominen, & Mäntsälä, ), variant collection and distribution on locus‐specific variation databases (LSDBs) (Piirilä, Väliaho, & Vihinen, ; Väliaho, Smith, & Vihinen, ; Vihinen et al., ), interpretation of variants and their effects (Lee et al., ; Väliaho, Faisal, Ortutay, Smith, & Vihinen, ; Vihinen et al., ), and the development of recommendations and standards for variation data (Celli, Dalgleish, Vihinen, Taschner, & den Dunnen, ; Vihinen et al., ; Vihinen, den Dunnen, Dalgleish, and Cotton, ) as well as the development of various prediction tools to filter and interpret harmful variants (Ali, Olatubosun, & Vihinen, ; Niroula & Vihinen, ; Niroula & Vihinen, ; Niroula, Urolagin, & Vihinen, ; Olatubosun, Väliaho, Härkönen, Thusberg, & Vihinen, ; Yang, Niroula, Shen, & Vihinen, ). In addition, we have promoted the importance of systematic performance assessments (Khan & Vihinen, ; Thusberg et al., ), systematic measures and reporting of prediction methods (Vihinen, ; Vihinen, ), and the need for benchmark datasets (Nair & Vihinen, ; Schaafsma & Vihinen, ) and for systematics and nomenclature for describing variants (Byrne et al., ; Vihinen, ; Vihinen, ; Vihinen, ). Currently, we curate about 130 LSDBs, mainly for primary immunodeficiencies (PIDs) (Piirilä et al., ; Schaafsma & Vihinen, ) and also for protein kinase and Src homology 2 (SH2) domain variants (Lappalainen, Thusberg, Shen, & Vihinen, ; Ortutay, Väliaho, Stenberg, & Vihinen, ).…”
Section: Introductionmentioning
confidence: 99%