2020
DOI: 10.1093/database/baaa026
|View full text |Cite
|
Sign up to set email alerts
|

UPCLASS: a deep learning-based classifier for UniProtKB entry publications

Abstract: In the UniProt Knowledgebase (UniProtKB), publications providing evidence for a specific protein annotation entry are organized across different categories, such as function, interaction and expression, based on the type of data they contain. To provide a systematic way of categorizing computationally mapped bibliographies in UniProt, we investigate a convolutional neural network (CNN) model to classify publications with accession annotations according to UniProtKB categories. The main challenge of categorizin… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
4
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
3

Relationship

3
3

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 11 publications
0
4
0
Order By: Relevance
“…Deep learning approaches like Convolutional Neural Networks (CNN) (Lecun et al, 1998;Teodoro et al, 2020), Recurrent Neural Networks (RNN) (Rumelhart et al, 1986), Long Short-Term Memory Networks (LSTM) (Hochreiter and Schmidhuber, 1997), and Transformer-based architectures (Vaswani et al, 2017), including pretrained language models such as BERT (Devlin et al, 2018), RoBERTa (Liu et al, 2019), and XL-Net (Yang et al, 2019), have demonstrated stateof-the-art efficacy in a diverse range of domains . Leveraging the hierarchical structure of documents, graph neural networks (GNNs) have also been effectively proposed to assign categories to biomedical documents (Ferdowsi et al, 2023(Ferdowsi et al, , 2022(Ferdowsi et al, , 2021.…”
Section: Related Workmentioning
confidence: 99%
“…Deep learning approaches like Convolutional Neural Networks (CNN) (Lecun et al, 1998;Teodoro et al, 2020), Recurrent Neural Networks (RNN) (Rumelhart et al, 1986), Long Short-Term Memory Networks (LSTM) (Hochreiter and Schmidhuber, 1997), and Transformer-based architectures (Vaswani et al, 2017), including pretrained language models such as BERT (Devlin et al, 2018), RoBERTa (Liu et al, 2019), and XL-Net (Yang et al, 2019), have demonstrated stateof-the-art efficacy in a diverse range of domains . Leveraging the hierarchical structure of documents, graph neural networks (GNNs) have also been effectively proposed to assign categories to biomedical documents (Ferdowsi et al, 2023(Ferdowsi et al, , 2022(Ferdowsi et al, , 2021.…”
Section: Related Workmentioning
confidence: 99%
“…Automatic text classification appears as an essential methodology to ensure high quality of living evidence updates. Text classification consists of assigning categorical labels to a given text passage (e.g., an abstract) based on its similarity to the existing labeled examples [ 23 25 ]. Classical text classifiers use statistical document representations, in which the relevance of a word to a document is proportional to its frequency in the document and inversely proportional to its frequency in the collection (the so-called term frequency-inverse document frequency (tf-idf) framework), to create a vectorial representations of the documents [ 26 ].…”
Section: Introductionmentioning
confidence: 99%
“…To address these issues, automated and augmented curation systems for extracting protein functional data from scientific literature are becoming increasingly desired. In particular, Machine Learning and Natural Language Processing techniques are beginning to be employed for biocuration efforts 1 , 2 for extracting and organising unstructured biological information into a structured form that is accessible to biologists. Central to these automated systems, is the process of unambiguously extracting semantic relationships between two or more biological entities in the literature 3 .…”
Section: Introductionmentioning
confidence: 99%