Eric Moyer scite author profile

Eric Moyer

3Publications

72Citation Statements Received

61Citation Statements Given

How they've been cited

How they cite others

Affiliations

National Center for Biotechnology Information, National Institutes of Health

Publications

Order By: Most citations

SPDI: data model for variants and applications at NCBI

Holmes

Moyer

Phan

et al. 2019

View full text Add to dashboard Cite

Motivation Normalizing sequence variants on a reference, projecting them across congruent sequences, and aggregating their diverse representations are critical to the elucidation of the genetic basis of disease and biological function. Inconsistent representation of variants among variant callers, local databases, and tools results in discrepancies that complicate analysis. NCBI’s genetic variation resources, dbSNP and ClinVar, require a robust, scalable set of principles to manage asserted sequence variants. Results The SPDI data model defines variants as a sequence of four attributes: sequence, position, deletion and insertion, and can be applied to nucleotide and protein variants. NCBI web services convert representations among HGVS, VCF, and SPDI and provide two functions to aggregate variants. One, based on the NCBI Variant Overprecision Correction Algorithm (VOCA), returns a unique, normalized representation termed the “Contextual Allele”. The SPDI data model, with its four operations, defines exactly the reference subsequence affected by the variant, even in repeat regions such as homopolymer and other sequence repeats. The second function projects variants across congruent sequences and depends on an alignment dataset (ADS) of non-assembly NCBI RefSeq sequences (prefixed NM, NR, NG), as well as inter- and intra-assembly-associated genomic sequences (NCs, NTs, and NWs), supporting robust projection of variants across congruent sequences and assembly versions. The variant is projected to all congruent Contextual Alleles. One of these Contextual Alleles, typically the allele based on the latest assembly version, represents the entire set, is designated the unique “Canonical Allele” and is used directly to aggregate variants across congruent sequences. Availability The SPDI services are available for open access at: https://api.ncbi.nlm.nih.gov/variation/v0 Supplementary information Supplementary data are available at Bioinformatics online.

show abstract

The GA4GH Variation Representation Specification: A computational framework for variation representation and federated identification

et al. 2021

View full text Add to dashboard Cite

SPDI: Data Model for Variants and Applications at NCBI

Holmes

Moyer

Phan

et al. 2019

Preprint

View full text Add to dashboard Cite

MotivationNormalizing diverse representations of sequence variants is critical to the elucidation of the genetic basis of disease and biological function. NCBI has long wrestled with integrating data from multiple submitters to build databases such as dbSNP and ClinVar. Inconsistent representation of variants among variant callers, local databases, and tools results in discrepancies and duplications that complicate analysis. Current tools are not robust enough to manage variants in different formats and different reference sequence coordinates. ResultsThe SPDI (pronounced "speedy") data model defines variants as a sequence of 4 operations: start at the boundary before the first position in the sequence S , advance P positions, delete D positions, then insert the sequence in the string I, giving the data 1 model its name, SPDI. The SPDI model can thus be applied to both nucleotide and protein variants, but the services discussed here are limited to the nucleotide. Current services convert representations between HGVS, VCF, and SPDI and provide two forms of normalization. The first, based on the NCBI Variant Overprecision Correction Algorithm, returns a unique, normalized representation termed the "Contextual Allele" for any input. The SPDI name, with its four operations, defines exactly the reference subsequence potentially affected by the variant, even in low complexity regions such as homopolymer and dinucleotide sequence repeats. The second level of normalization depends on alignment dataset (ADS). SPDI services perform remapping (AKA lift-over) of variants from the input reference sequence to return a list of all equivalent Contextual Alleles based on the transcript or genomic sequences that were aligned. One of these contextual alleles is selected to represent all, usually, that based on the latest genomic assembly such as GRCh38 and is designated as the unique "Canonical Allele". ADS includes alignments between non-assembly RefSeq sequences (prefixed NM, NR, NG), as well inter-and intra-assembly-associated genomic sequences (NCs, NTs, and NWs) and this allows for robust remapping and normalization of variants across sequences and assembly versions. Availability and implementationThe SPDI services are available for open access at: https://api.ncbi.nlm.nih.gov/variation/v0/

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Eric Moyer

SPDI: data model for variants and applications at NCBI

The GA4GH Variation Representation Specification: A computational framework for variation representation and federated identification

SPDI: Data Model for Variants and Applications at NCBI

Contact Info

Product

Resources

About