2006
DOI: 10.1186/1471-2105-7-58
|View full text |Cite
|
Sign up to set email alerts
|

Identifying biological concepts from a protein-related corpus with a probabilistic topic model

Abstract: Background: Biomedical literature, e.g., MEDLINE, contains a wealth of knowledge regarding functions of proteins. Major recurring biological concepts within such text corpora represent the domains of this body of knowledge. The goal of this research is to identify the major biological topics/concepts from a corpus of protein-related MEDLINE © titles and abstracts by applying a probabilistic topic model.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
42
0

Year Published

2010
2010
2021
2021

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 55 publications
(43 citation statements)
references
References 24 publications
0
42
0
Order By: Relevance
“…Although the documents, or abstracts, are known and observed, the topics are hidden or latent (Piepenbrink & Nurmammadov, 2015). This allows for modelling at a fine granularity as it realistically sees texts as made up of different topics rather than being "about" one topic alone (Zheng, Mclean, & Lu, 2006). This allows for modelling at a fine granularity as it realistically sees texts as made up of different topics rather than being "about" one topic alone (Zheng, Mclean, & Lu, 2006).…”
Section: Topic Models-ldamentioning
confidence: 99%
See 3 more Smart Citations
“…Although the documents, or abstracts, are known and observed, the topics are hidden or latent (Piepenbrink & Nurmammadov, 2015). This allows for modelling at a fine granularity as it realistically sees texts as made up of different topics rather than being "about" one topic alone (Zheng, Mclean, & Lu, 2006). This allows for modelling at a fine granularity as it realistically sees texts as made up of different topics rather than being "about" one topic alone (Zheng, Mclean, & Lu, 2006).…”
Section: Topic Models-ldamentioning
confidence: 99%
“…First, it sees a document as a bag of words, where the order of words is inconsequential for our analysis (Blei et al, 2003;Grimmer & Stewart, 2013). The choice of the correct number of topics is crucial as it determines the granularity of the results and the fit of the model for the data, that is, how well the model describes the underlying data (Griffiths & Steyvers, 2004;Zheng et al, 2006). Second, it is based on the assumption that the number of topics k is fixed and known, which is an input parameter of the LDA.…”
Section: Topic Models-ldamentioning
confidence: 99%
See 2 more Smart Citations
“…Thus, LDA automatically nds topics in a text, or in other words, LDA attempts to go back from the document and nd the set of topics that may have generated it. Zheng, McLean, and Lu (2006) make use of LDA to identify biological topicsi.e. conceptsfrom a corpus composed of biomedical articles that belong to MEDLINE; to that end, rst, they use LDA to identify the most relevant concepts, and subsequently, these concepts are mapped to a biomedical vocabulary: Gene Ontology.…”
Section: Latent Dirichlet Allocationmentioning
confidence: 99%