2019
DOI: 10.1101/537449
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

SPDI: Data Model for Variants and Applications at NCBI

Abstract: MotivationNormalizing diverse representations of sequence variants is critical to the elucidation of the genetic basis of disease and biological function. NCBI has long wrestled with integrating data from multiple submitters to build databases such as dbSNP and ClinVar. Inconsistent representation of variants among variant callers, local databases, and tools results in discrepancies and duplications that complicate analysis. Current tools are not robust enough to manage variants in different formats and differ… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
19
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
3
3

Relationship

1
5

Authors

Journals

citations
Cited by 14 publications
(19 citation statements)
references
References 15 publications
(8 reference statements)
0
19
0
Order By: Relevance
“…In terms of genetic cardiomyopathies, pathogenic S1PR1 variants have not been described in humans. However, the Genome Aggregation Database (gnomAD), a data set of 125,748 exome sequences and 15,708 whole-genome sequences from unrelated individuals, identified two unique frameshift variant alleles at amino acid positions 255 and 256 (rs1166435525 and rs1395550411 in the dbSNP database, respectively) [39,40]. These two variants disrupt S1P1 at the origin of transmembrane domain six.…”
Section: Discussionmentioning
confidence: 99%
“…In terms of genetic cardiomyopathies, pathogenic S1PR1 variants have not been described in humans. However, the Genome Aggregation Database (gnomAD), a data set of 125,748 exome sequences and 15,708 whole-genome sequences from unrelated individuals, identified two unique frameshift variant alleles at amino acid positions 255 and 256 (rs1166435525 and rs1395550411 in the dbSNP database, respectively) [39,40]. These two variants disrupt S1P1 at the origin of transmembrane domain six.…”
Section: Discussionmentioning
confidence: 99%
“…That is, the same empirical resulting sequence could be represented with multiple variation expressions. Normalization is the process by which a representation is converted into a single canonical form 11,[28][29][30] . VRS adopts the use of a fully-justified representation (Figure 3b), ensuring that insertions and deletions in repetitive regions are not arbitrarily located to a specific position within a sequence, but instead describe the alteration over the entire region of ambiguity.…”
Section: Conventions That Promote Reliable Data Sharingmentioning
confidence: 99%
“…Coordinates are the location of the insertion on the reference sequence depicted in panel A, using residue or inter-residue coordinates as specified by the corresponding representation system. (C) Full-justification Allele normalization is enabled by a specified normalization algorithm extended from VOCA 28 . In this example, the unnormalized Allele “reference” and “alternate” sequences (step 0) are trimmed of their common suffix “CA” (step 1).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…As early as 1993, a standard gene-based nomenclature was proposed ( 11 ) that would later become the Human Genome Variation Society notation. This approach, however, produces several ambiguous edge cases that hamper exact determination ( 68 ).…”
Section: Grch39 and Beyond: Future Challenges Of Human Genome Annotatmentioning
confidence: 99%