UniProt: the universal protein knowledgebase in 2021

Bateman, Alex; Martin, Maria-Jesus; Orchard, Sandra; Magrane, Michele; Agivetova, Rahat; Ahmad, Shadab; Alpi, Emanuele; Bowler-Barnett, Emily; Britto, Ramona; Bursteinas, Borisas; Bye-A-Jee, Hema; Coetzee, Ray; Cukura, Austra; Silva, Alan Da; Denny, Paul; Doğan, Tunca; Ebenezer, ThankGod Echezona; Fan, Jun; Castro, Leyla Garcia; Garmiri, Penelope; Georghiou, George P.; Gonzales, Leonardo; Hatton-Ellis, Emma; Hussein, Abdulrahman; Ignatchenko, Alexandr; Insana, Giuseppe; Ishtiaq, Rizwan; Jokinen, Petteri; Joshi, Vishal; Jyothi, Dushyanth; Lock, Antonia; López, Rodrigo; Luciani, Aurélien; Luo, Jie; Lussi, Yvonne C.; MacDougall, Alistair J.; Madeira, Fábio; Mahmoudy, Mahdi; Menchi, Manuela; Mishra, Alok; Moulang, Katie; Nightingale, Andrew; Oliveira, Carla Susana; Pundir, Sangya; Qi, Guoying; Raj, Shriya; Rice, Daniel L; Lopez, Milagros Rodriguez; Saidi, Rabie; Sampson, J. H.; Sawford, Tony; Speretta, Elena; Turner, E. B.; Tyagi, Nidhi; Vasudev, Preethi; Volynkin, Vladimir; Warner, Kate; Watkins, Xavier; Zaru, Rossana; Zellner, Hermann; Bridge, Alan; Poux, Sylvain; Redaschi, Nicole; Aimo, Lucila; Argoud-Puy, Ghislaine; Auchincloss, Andrea; Axelsen, Kristian; Bansal, Parit; Baratin, Delphine; Blatter, Marie-Claude; Bolleman, Jerven; Boutet, Emmanuel; Breuza, Lionel; Casals-Casas, Cristina; Castro, Edouard de; Echioukh, Kamal Chikh; Coudert, Elisabeth; Cuche, Béatrice A.; Doche, Mikael; Dornevil, Dolnide; Estreicher, Anne; Famiglietti, Maria Livia; Feuermann, Marc; Gasteiger, Elisabeth; Géhant, Sébastien; Gerritsen, Vivienne Baillie; Gos, Arnaud; Gruaz-Gumowski, Nadine; Hinz, Ursula; Hulo, Chantal; Hyka‐Nouspikel, Nevila; Jungo, Florence; Keller, Guillaume; Kerhornou, Arnaud; Lara, Vicente; Mercier, Philippe Le; Lieberherr, Damien; Lombardot, Thierry; Martin, Xavier D.; Masson, Patrick; Morgat, Anne; Neto, Teresa Batista; Paesano, Salvo; Pedruzzi, Ivo; Pilbout, Sandrine; Pourcel, Lucille; Pozzato, Monica; Pruess, Manuela; Rivoire, Catherine; Sigrist, Christian J. A.; Sonesson, Karin; Stutz, André; Sundaram, Shyamala; Tognolli, Michael; Verbregue, Laure; Wu, Cathy H.; Arighi, Cecilia N.; Arminski, Leslie; Chen, Chuming; Chen, Yongxing; Garavelli, John S.; Huang, Hongzhan; Laiho, Kati; McGarvey, Peter B.; Natale, Darren A.; Ross, Karen; Vinayaka, C. R.; Wang, Qinghua; Wang, Yuqi; Yeh, Lai-Su L.; Zhang, Jian; Ruch, Patrick; Teodoro, Douglas

doi:10.1093/nar/gkaa1100

Cited by 4,900 publications

(2,497 citation statements)

References 50 publications

Supporting

Mentioning

2,480

Contrasting

Unclassified

Order By: Relevance

“…multiple sequence alignments (MSAs) and position-specific scoring matrices (PSSMs) computed by a combination of pairwise BLAST (24), PSI-BLAST (25), and MMseqs2 (11, 12) on query vs. PDB (26) and query vs. UniProt (1). For each residue in the query, the following per-residue predictions are assembled: secondary structure (RePROF/PROFsec (5, 27) and ProtBertSec (14)); solvent accessibility (RePROF/PROFacc); transmembrane helices and strands (TMSEG (28) and PROFtmb (29)); protein disorder (Meta-Disorder (30)); backbone flexibility (relative B-values; PROFbval (31)); disulfide bridges (DISULFIND (32)); sequence conservation (ConSurf/ConSeq (33–36)); protein-protein, protein-DNA, and protein-RNA binding residues (ProNA2020 (3)); PROSITE motifs (37); effects of sequence variation (single amino acid variants, SAVs; SNAP2 (38)).…”

Section: Methodsmentioning

confidence: 99%

“…Sequence similarity and automatic assignment via UniRule suggest NCAP is RNA binding (binding with the viral genome), binding with the membrane protein M (UniProt identifier P0DTC5/VME1_SARS2), and is fundamental for virion assembly. goPredSim (19) transferred GO terms from other proteins for MFO ( RNA-binding ; GO:0003723; ECO:0000213) and CCO (compartments in the host cell and viral nucleocapsid; GO:0019013; GO:0044172; GO:0044177; GO:0044220; GO:0030430; ECO:0000255) matching annotations found in UniProt (1). While it missed the experimentally verified MFO term identical protein binding (GO:0042802), go-PredSim predicted protein folding (GO:0006457) and protein ubiquitination (GO:0016567) suggesting the nucleoprotein to be involved in biological processes requiring protein binding.…”

Section: Use Casementioning

confidence: 99%

“…The sequence is known for far more proteins (1) than experimental annotations of function or structure (2, 3). This sequence-annotation gap existed when PredictProtein (4, 5) started in 1992, and has kept expanding ever since (6).…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

PredictProtein – Predicting Protein Structure and Function for 29 Years

Bernhofer

Dallago

Karl

et al. 2021

Preprint

View full text Add to dashboard Cite

Since 1992 PredictProtein (https://predictprotein.org) is a one-stop online resource for protein sequence analysis with its main site hosted at the Luxembourg Centre for Systems Biomedicine (LCSB) and queried monthly by over 3,000 users in 2020. PredictProtein was the first Internet server for protein predictions. It pioneered combining evolutionary information and machine learning. Given a protein sequence as input, the server outputs multiple sequence alignments, predictions of protein structure in 1D and 2D (secondary structure, solvent accessibility, transmembrane segments, disordered regions, protein flexibility, and disulfide bridges) and predictions of protein function (functional effects of sequence variation or point mutations, Gene Ontology (GO) terms, subcellular localization, and protein-, RNA-, and DNA binding). PredictProtein's infrastructure has moved to the LCSB increasing throughput; the use of MMseqs2 sequence search reduced runtime five-fold; user interface elements improved usability, and new prediction methods were added. PredictProtein recently included predictions from deep learning embeddings (GO and second-ary structure) and a method for the prediction of proteins and residues binding DNA, RNA, or other proteins. Pre-dictProtein.org aspires to provide reliable predictions to computational and experimental biologists alike. All scripts and methods are freely available for offline execution in high-throughput settings.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Use Casementioning

confidence: 99%

See 1 more Smart Citation

PredictProtein – Predicting Protein Structure and Function for 29 Years

Bernhofer

Dallago

Karl

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Authors chose 3-mers amino acids for proteins and 5-mers for nucleic acids as words. Three datasets (Uniprot400k [81], RRM3k [82], and Homeo8k [83]) were used to pre-train the Fast-Bioseq protein embedding models, whereas RNA embedding models were trained directly from the RRM162 dataset [82]. In contrast, 8-mer frequency features were used for the DNA sequences in the Homeo215 dataset [84].…”

Section: Applications For Molecular Interactionsmentioning

confidence: 99%

Representation learning applications in biological sequence analysis

Iuchi

Matsutani

Yamada

et al. 2021

Preprint

View full text Add to dashboard Cite

Remarkable advances in high-throughput sequencing have resulted in rapid data accumulation, and analyzing biological (DNA/RNA/protein) sequences to discover new insights in biology has become more critical and challenging. To tackle this issue, the application of natural language processing (NLP) to biological sequence analysis has received increased attention, because biological sequences are regarded as sentences and k-mers in these sequences as words. Embedding is an essential step in NLP, which converts words into vectors. This transformation is called representation learning and can be applied to biological sequences. Vectorized biological sequences can be used for function and structure estimation, or as inputs for other probabilistic models. Given the importance and growing trend in the application of representation learning in biology, here, we review the existing knowledge in representation learning for biological sequence analysis.

show abstract

“…After obtaining the core compound targets, the gene symbol was converted into Ensembl ID through the Uniprot database (http://www.uniprot.org/), 22 imported into the website OmishareTools (http://www.omicshare.com/tools/index.php/) for GO enrichment function and KEGG enrichment analysis, and nally screened by the P value. GO enrichment mainly analyzed the biological process, cellular composition, and molecular function of the target, while KEGG enrichment could study the potential biological pathways and functions involved in the target.…”

Section: Enrichment Analysis Of Go and Keggmentioning

confidence: 99%

Mechanism of E Lian Granule Reversing Chronic Atrophic Gastritis With Intestinal Metaplasia Based on Integrated Pharmacology and GEO Gene Chip

Gu¹,

Xue²,

Xue³

et al. 2021

Preprint

View full text Add to dashboard Cite

Background: This study aimed to explore the main components and targets of E-Lian granule through which it reversed chronic atrophic gastritis with intestinal metaplasia, based on the traditional Chinese Medicine Integrated Pharmacology Network Computing Research Platform V2.0 (TCMIP V2.0) combined with GEO gene chips. It also aimed to construct various networks to predict and analyze the mechanism of E-Lian granule in treating gastric precancerous lesions. Methods: The effective traditional Chinese medicine components and targets of E-Lian granule prescription were obtained using TCMIP V2.0. The disease targets were collected using the TCMIP V2.0 platform and the verified gene chips in the GEO database, and the “drug components–targets” network, “compound–targets protein interaction network,” and “core compound targets–pathways network” were constructed using Cytoscape 3.6.1. The reliability of the predicted components and targets was verified using Pymol 1.7.2.1 and Autodock Vina 1.1.2 reverse molecular docking. Results: A total of 262 unique active components and 680 potential active targets of E-Lian granule were obtained. Moreover, 2247 unique disease targets of chronic atrophic gastritis with intestinal metaplasia were obtained by searching the “Disease/Symptom Target Database” combined with the GEO chip (GSE78523) and GeneCard database. Further, 178 complex targets and 38 complex core targets were obtained using Venn and Filter, respectively, such as ALB, TNF, PTGS2, RHOA, ESR1, HRAS, JUN, FOS, CASP3 and so forth. The GO and KEGG nrichment analyses showed that E-Lian granule reversed gastric precancerous lesions not only through the direct intervention of the cancer pathway, gastric cancer pathway, and epithelial signal transduction in Helicobacter pylori infection but also through PI3K/AKT, VEGF, MAPK, cAMP, cGMP, Th1/Th2,and other pathways. It also had a significant correlation with cholinergic, 5-hydroxytryptamine, dopaminergic, and other gastrointestinal hormone-related signals. Finally, the core target verified in the GSE78523 chip was successfully used to dock with the active components of E-Lian granules. The reliability of the prediction was also verified. Conclusions: The components and molecular mechanism of E-Lian granule in reversing chronic atrophic gastritis with intestinal metaplasia were predicted by integrated pharmacology, GEO chip, and reverse molecular docking, providing an important theoretical basis for further study of the effective substances and mechanism of E-Lian granule in treating chronic atrophic gastritis.

show abstract

UniProt: the universal protein knowledgebase in 2021

Cited by 4,900 publications

References 50 publications

PredictProtein – Predicting Protein Structure and Function for 29 Years

PredictProtein – Predicting Protein Structure and Function for 29 Years

Representation learning applications in biological sequence analysis

Mechanism of E Lian Granule Reversing Chronic Atrophic Gastritis With Intestinal Metaplasia Based on Integrated Pharmacology and GEO Gene Chip

Contact Info

Product

Resources

About

UniProt: the universal protein knowledgebase in 2021

Cited by 4,900 publications

References 50 publications

PredictProtein – Predicting Protein Structure and Function for 29 Years

PredictProtein – Predicting Protein Structure and Function for 29 Years

Representation learning applications in biological sequence analysis

Mechanism of E Lian Granule Reversing Chronic Atrophic Gastritis With Intestinal Metaplasia Based on Integrated Pharmacology and&nbsp;GEO Gene Chip

Contact Info

Product

Resources

About

Mechanism of E Lian Granule Reversing Chronic Atrophic Gastritis With Intestinal Metaplasia Based on Integrated Pharmacology and GEO Gene Chip