James Stephenson scite author profile

The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this publication we describe enhancements made to our data processing pipeline and to our website to adapt to an ever-increasing information content. The number of sequences in UniProtKB has risen to over 227 million and we are working towards including a reference proteome for each taxonomic group. We continue to extract detailed annotations from the literature to update or create reviewed entries, while unreviewed entries are supplemented with annotations provided by automated systems using a variety of machine-learning techniques. In addition, the scientific community continues their contributions of publications and annotations to UniProt entries of their interest. Finally, we describe our new website (https://www.uniprot.org/), designed to enhance our users’ experience and make our data easily accessible to the research community. This interface includes access to AlphaFold structures for more than 85% of all entries as well as improved visualisations for subcellular localisation of proteins.

show abstract

Quantifying the contribution of recessive coding variation to developmental disorders

Martin

Jones

McIntyre

et al. 2018

Science

161

109

View full text Add to dashboard Cite

We estimated the genome-wide contribution of recessive coding variation from 6,040 families from the Deciphering Developmental Disorders study. The proportion of cases attributable to recessive coding variants was 3.6% in patients of European ancestry, compared to 50% explained by de novo coding mutations. It was higher (31%) in patients with Pakistani ancestry, due to elevated autozygosity. Half of this recessive burden is attributable to known genes. We identified two genes not previously associated with recessive developmental disorders, KDM5B and EIF3F, and functionally validated them with mouse and cellular models. Our results suggest that recessive coding variants account for a small fraction of currently undiagnosed non-consanguineous individuals, and that the role of noncoding variants, incomplete penetrance, and polygenic mechanisms need further exploration.

show abstract

Quantifying the contribution of recessive coding variation to developmental disorders

Martin

Jones

Stephenson

et al. 2017

Preprint

View full text Add to dashboard Cite

Large exome-sequencing datasets offer an unprecedented opportunity to understand the genetic architecture of rare diseases, informing clinical genetics counseling and optimal study designs for disease gene identification. We analyzed 7,448 exome-sequenced families from the Deciphering Developmental Disorders study, and, for the first time, estimated the causal contribution of recessive coding variation exome-wide. We found that the proportion of cases attributable to recessive coding variants is surprisingly low in patients of European ancestry, at only 3.6%, versus 50% of cases explained by de novo coding mutations. Surprisingly, we found that, even in European probands with affected siblings, recessive coding variants are only likely to explain ~12% of cases. In contrast, they account for 31% of probands with Pakistani ancestry due to elevated autozygosity. We tested every gene for an excess of damaging homozygous or compound heterozygous genotypes and found three genes that passed stringent Bonferroni correction: EIF3F, KDM5B, and THOC6. EIF3F is a novel disease gene, and KDM5B has previously been reported as a dominant disease gene. KDM5B appears to follow a complex mode of inheritance, in which heterozygous loss-of-function variants (LoFs) show incomplete penetrance and biallelic LoFs are fully penetrant. Our results suggest that a large proportion of undiagnosed developmental disorders remain to be explained by other factors, such as noncoding variants and polygenic risk.

show abstract

VarSite: Disease variants and protein structure

et al. 2019

View full text Add to dashboard Cite

VarSite is a web server mapping known disease‐associated variants from UniProt and ClinVar, together with natural variants from gnomAD, onto protein 3D structures in the Protein Data Bank. The analyses are primarily image‐based and provide both an overview for each human protein, as well as a report for any specific variant of interest. The information can be useful in assessing whether a given variant might be pathogenic or benign. The structural annotations for each position in the protein include protein secondary structure, interactions with ligand, metal, DNA/RNA, or other protein, and various measures of a given variant's possible impact on the protein's function. The 3D locations of the disease‐associated variants can be viewed interactively via the 3dmol.js JavaScript viewer, as well as in RasMol and PyMOL. Users can search for specific variants, or sets of variants, by providing the DNA coordinates of the base change(s) of interest. Additionally, various agglomerative analyses are given, such as the mapping of disease and natural variants onto specific Pfam or CATH domains. The server is freely accessible to all at: https://www.ebi.ac.uk/thornton-srv/databases/VarSite.

show abstract

Three-Dimensional RNA Structure of the Major HIV-1 Packaging Signal Region

Stephenson

Kenyon

et al. 2013

Structure

View full text Add to dashboard Cite

SummaryHIV-1 genomic RNA has a noncoding 5′ region containing sequential conserved structural motifs that control many parts of the life cycle. Very limited data exist on their three-dimensional (3D) conformation and, hence, how they work structurally. To assemble a working model, we experimentally reassessed secondary structure elements of a 240-nt region and used single-molecule distances, derived from fluorescence resonance energy transfer, between defined locations in these elements as restraints to drive folding of the secondary structure into a 3D model with an estimated resolution below 10 Å. The folded 3D model satisfying the data is consensual with short nuclear-magnetic-resonance-solved regions and reveals previously unpredicted motifs, offering insight into earlier functional assays. It is a 3D representation of this entire region, with implications for RNA dimerization and protein binding during regulatory steps. The structural information of this highly conserved region of the virus has the potential to reveal promising therapeutic targets.

show abstract

Characterizing 3D RNA structure by single molecule FRET

Stephenson

Kenyon

Symmons

et al. 2016

Methods

View full text Add to dashboard Cite

VarMap: a web tool for mapping genomic coordinates to protein sequence and structure and retrieving protein structural annotations

Stephenson

Laskowski

Nightingale

et al. 2019

View full text Add to dashboard Cite

Motivation Understanding the protein structural context and patterning on proteins of genomic variants can help to separate benign from pathogenic variants and reveal molecular consequences. However, mapping genomic coordinates to protein structures is non-trivial, complicated by alternative splicing and transcript evidence. Results Here we present VarMap, a web tool for mapping a list of chromosome coordinates to canonical UniProt sequences and associated protein 3D structures, including validation checks, and annotating them with structural information. Availability and implementation https://www.ebi.ac.uk/thornton-srv/databases/VarMap. Supplementary information Supplementary data are available at Bioinformatics online.

show abstract

Unearthing the Root of Amino Acid Similarity

Stephenson

Freeland

2013

J Mol Evol

View full text Add to dashboard Cite

Similarities and differences between amino acids define the rates at which they substitute for one another within protein sequences and the patterns by which these sequences form protein structures. However, there exist many ways to measure similarity, whether one considers the molecular attributes of individual amino acids, the roles that they play within proteins, or some nuanced contribution of each. One popular approach to representing these relationships is to divide the 20 amino acids of the standard genetic code into groups, thereby forming a simplified amino acid alphabet. Here, we develop a method to compare or combine different simplified alphabets, and apply it to 34 simplified alphabets from the scientific literature. We use this method to show that while different suggestions vary and agree in non-intuitive ways, they combine to reveal a consensus view of amino acid similarity that is clearly rooted in physico-chemistry.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.