Harry M. Scholes scite author profile

CATH (https://www.cathdb.info) identifies domains in protein structures from wwPDB and classifies these into evolutionary superfamilies, thereby providing structural and functional annotations. There are two levels: CATH-B, a daily snapshot of the latest domain structures and superfamily assignments, and CATH+, with additional derived data, such as predicted sequence domains, and functionally coherent sequence subsets (Functional Families or FunFams). The latest CATH+ release, version 4.3, significantly increases coverage of structural and sequence data, with an addition of 65,351 fully-classified domains structures (+15%), providing 500 238 structural domains, and 151 million predicted sequence domains (+59%) assigned to 5481 superfamilies. The FunFam generation pipeline has been re-engineered to cope with the increased influx of data. Three times more sequences are captured in FunFams, with a concomitant increase in functional purity, information content and structural coverage. FunFam expansion increases the structural annotations provided for experimental GO terms (+59%). We also present CATH-FunVar web-pages displaying variations in protein sequences and their proximity to known or predicted functional sites. We present two case studies (1) putative cancer drivers and (2) SARS-CoV-2 proteins. Finally, we have improved links to and from CATH including SCOP, InterPro, Aquaria and 2DProt.

show abstract

CATH: expanding the horizons of structure-based functional annotations for genome sequences

Sillitoe

et al. 2018

View full text Add to dashboard Cite

This article provides an update of the latest data and developments within the CATH protein structure classification database (http://www.cathdb.info). The resource provides two levels of release: CATH-B, a daily snapshot of the latest structural domain boundaries and superfamily assignments, and CATH+, which adds layers of derived data, such as predicted sequence domains, functional annotations and functional clustering (known as Functional Families or FunFams). The most recent CATH+ release (version 4.2) provides a huge update in the coverage of structural data. This release increases the number of fully- classified domains by over 40% (from 308 999 to 434 857 structural domains), corresponding to an almost two- fold increase in sequence data (from 53 million to over 95 million predicted domains) organised into 6119 superfamilies. The coverage of high-resolution, protein PDB chains that contain at least one assigned CATH domain is now 90.2% (increased from 82.3% in the previous release). A number of highly requested features have also been implemented in our web pages: allowing the user to view an alignment between their query sequence and a representative FunFam structure and providing tools that make it easier to view the full structural context (multi-domain architecture) of domains and chains.

show abstract

SARS-CoV-2 spike protein predicted to form complexes with host receptor protein orthologues from a broad range of mammals

Lam

Bordin

Waman

et al. 2020

Sci Rep

View full text Add to dashboard Cite

SARS-CoV-2 has a zoonotic origin and was transmitted to humans via an undetermined intermediate host, leading to infections in humans and other mammals. To enter host cells, the viral spike protein (S-protein) binds to its receptor, ACE2, and is then processed by TMPRSS2. Whilst receptor binding contributes to the viral host range, S-protein:ACE2 complexes from other animals have not been investigated widely. To predict infection risks, we modelled S-protein:ACE2 complexes from 215 vertebrate species, calculated changes in the energy of the complex caused by mutations in each species, relative to human ACE2, and correlated these changes with COVID-19 infection data. We also analysed structural interactions to better understand the key residues contributing to affinity. We predict that mutations are more detrimental in ACE2 than TMPRSS2. Finally, we demonstrate phylogenetically that human SARS-CoV-2 strains have been isolated in animals. Our results suggest that SARS-CoV-2 can infect a broad range of mammals, but few fish, birds or reptiles. Susceptible animals could serve as reservoirs of the virus, necessitating careful ongoing animal management and surveillance.

show abstract

SARS-CoV-2 spike protein predicted to form complexes with host receptor protein orthologues from a broad range of mammals

Bordin

Waman

et al. 2020

Preprint

View full text Add to dashboard Cite

SARS-CoV-2 has a zoonotic origin and was transmitted to humans via an undetermined intermediate host, leading to infections in humans and other mammals. To enter host cells, the viral spike protein (S-protein) binds to its receptor, ACE2, and is then processed by TMPRSS2. Whilst receptor binding contributes to the viral host range, S-protein:ACE2 complexes from other animals have not been investigated widely. To predict infection risks, we modelled S-protein:ACE2 complexes from 215 vertebrate species, calculated their relative energies, correlated these energies to COVID-19 infection data, and analysed structural interactions. We predict that known mutations are more detrimental in ACE2 than TMPRSS2. Finally, we demonstrate phylogenetically that human SARS-CoV-2 strains have been isolated in animals. Our results suggest that SARS-CoV-2 can infect a broad range of mammals, but not fish, birds or reptiles. Susceptible animals could serve as reservoirs of the virus, necessitating careful ongoing animal management and surveillance.

show abstract

CD8+ T Cell Responses to Lytic EBV Infection: Late Antigen Specificities as Subdominant Components of the Total Response

Abbott

Quinn

Leese

et al. 2013

View full text Add to dashboard Cite

Epstein-Barr virus (EBV) elicits primary CD8+ T cell responses that, by T cell cloning from infectious mononucleosis (IM) patients, appear skewed towards immediate early (IE) and some early (E) lytic cycle proteins, with late (L) proteins rarely targeted. However, L antigen-specific responses have been regularly detected in polyclonal T cell cultures from long-term virus carriers. To resolve this apparent difference between responses to primary and persistent infection, 13 long-term carriers were screened in ex vivo IFN-γ ELISPOT assays using peptides spanning the 2 IE, 6 representative E and 7 representative L proteins. This revealed memory CD8 responses to 44 new lytic cycle epitopes that straddle all three protein classes but, in terms of both frequency and size, maintain the IE > E > L hierarchy of immunodominance. Having identified the HLA restriction of 10 (including 7L) new epitopes using memory CD8+ T cell clones, we looked in HLA-matched IM patients and found such reactivities but typically at low levels, explaining why they had gone undetected in the original IM clonal screens. Wherever tested, all CD8+ T cell clones against these novel lytic cycle epitopes recognised lytically-infected cells naturally expressing their target antigen. Surprisingly, however, clones against the most frequently recognised L antigen, the BNRF1 tegument protein, also recognised latently-infected, growth-transformed cells. We infer that BNRF1 is also a latent antigen that could be targeted in T cell therapy of EBV-driven B-lymphoproliferative disease.

show abstract

CATH functional families predict functional sites in proteins

Das

Scholes

Sen

et al. 2020

View full text Add to dashboard Cite

Motivation Identification of functional sites in proteins is essential for functional characterization, variant interpretation and drug design. Several methods are available for predicting either a generic functional site, or specific types of functional site. Here, we present FunSite, a machine learning predictor that identifies catalytic, ligand-binding and protein-protein interaction functional sites using features derived from protein sequence and structure, and evolutionary data from CATH functional families (FunFams). Results FunSite’s prediction performance was rigorously benchmarked using cross-validation and a holdout dataset. FunSite outperformed other publicly-available functional site prediction methods. We show that conserved residues in FunFams are enriched in functional sites. We found FunSite’s performance depends greatly on the quality of functional site annotations and the information content of FunFams in the training data. Finally, we analyse which structural and evolutionary features are most predictive for functional sites. Availability https://github.com/UCL/cath-funsite-predictor. Contact c.orengo@ucl.ac.uk Supplementary information Supplementary data are available at Bioinformatics online.

show abstract

CATH functional families predict protein functional sites

Das

Scholes

Orengo

2020

Preprint

View full text Add to dashboard Cite

Motivation: Identification of functional sites in proteins is essential for functional characterisation, variant interpretation and drug design. Several methods are available for predicting either a generic functional site, or specific types of functional site. Here, we present FunSite, a machine learning predictor that identifies catalytic, ligand-binding and protein-protein interaction functional sites using features derived from protein sequence and structure, and evolutionary data from CATH functional families (Fun-Fams). Results: FunSite's prediction performance was rigorously benchmarked using cross-validation and a holdout dataset. FunSite outperformed all publicly-available functional site prediction methods. We show that conserved residues in FunFams are enriched in functional sites. We found FunSite's performance depends greatly on the quality of functional site annotations and the information content of FunFams in the training data. Finally, we analyse which structural and evolutionary features are most predictive for functional sites. Availability: The datasets and prediction models are available on request. Contact

show abstract

Dynamic changes in the brain protein interaction network correlates with progression of Aβ42 pathology in Drosophila

Scholes

Cryar

Kerr

et al. 2020

Sci Rep

View full text Add to dashboard Cite

Alzheimer’s disease (AD), the most prevalent form of dementia, is a progressive and devastating neurodegenerative condition for which there are no effective treatments. Understanding the molecular pathology of AD during disease progression may identify new ways to reduce neuronal damage. Here, we present a longitudinal study tracking dynamic proteomic alterations in the brains of an inducible Drosophila melanogaster model of AD expressing the Arctic mutant Aβ42 gene. We identified 3093 proteins from flies that were induced to express Aβ42 and age-matched healthy controls using label-free quantitative ion-mobility data independent analysis mass spectrometry. Of these, 228 proteins were significantly altered by Aβ42 accumulation and were enriched for AD-associated processes. Network analyses further revealed that these proteins have distinct hub and bottleneck properties in the brain protein interaction network, suggesting that several may have significant effects on brain function. Our unbiased analysis provides useful insights into the key processes governing the progression of amyloid toxicity and forms a basis for further functional analyses in model organisms and translation to mammalian systems.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Harry M. Scholes

CATH: increased structural coverage of functional space

CATH: expanding the horizons of structure-based functional annotations for genome sequences

SARS-CoV-2 spike protein predicted to form complexes with host receptor protein orthologues from a broad range of mammals

SARS-CoV-2 spike protein predicted to form complexes with host receptor protein orthologues from a broad range of mammals

CD8+ T Cell Responses to Lytic EBV Infection: Late Antigen Specificities as Subdominant Components of the Total Response

CATH functional families predict functional sites in proteins

CATH functional families predict protein functional sites

Dynamic changes in the brain protein interaction network correlates with progression of Aβ42 pathology in Drosophila

Contact Info

Product

Resources

About