2006
DOI: 10.1007/11760146_9
|View full text |Cite
|
Sign up to set email alerts
|

Analyzing Entities and Topics in News Articles Using Statistical Topic Models

Abstract: Abstract. Statistical language models can learn relationships between topics discussed in a document collection and persons, organizations and places mentioned in each document. We present a novel combination of statistical topic models and named-entity recognizers to jointly analyze entities mentioned (persons, organizations and places) and topics discussed in a collection of 330,000 New York Times news articles. We demonstrate an analytic framework which automatically extracts from a large collection: topics… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
50
0
2

Year Published

2013
2013
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 80 publications
(52 citation statements)
references
References 15 publications
0
50
0
2
Order By: Relevance
“…It is this specific use that led us to applying them to incident reports. Former successful experiments have been run on collection of scientific publications (Hall et al, 2008), newspaper articles (Newman et al, 2006) and encyclopedia entries (Blei, 2012).…”
Section: Topic Modelling Of Incident Reportsmentioning
confidence: 99%
“…It is this specific use that led us to applying them to incident reports. Former successful experiments have been run on collection of scientific publications (Hall et al, 2008), newspaper articles (Newman et al, 2006) and encyclopedia entries (Blei, 2012).…”
Section: Topic Modelling Of Incident Reportsmentioning
confidence: 99%
“…A topic is then associated with five clinical measurements, including age, progress status, genotype, clearance (natural healing) and duration of infection with varying proportions, to show the probability distributions of the terms to constitute the topic (for example, age between 20-30 with a probability of 20%, progress 'a' with 10% and genotype 16 with 40%, clear '0' with 10% and duration with 20%). A large body of works exists that have applied LDA to a number of tasks, including news article analysis (17), cancer prediction (18) and analysis of scientific ideas (19). However, limited studies have been directed towards identifying complex interactions of clinical measurements, as proposed in the present study, in association with HPV-positive patients.…”
Section: Resultsmentioning
confidence: 99%
“…A key feature of this model is that it is an unsupervised learning technique, which means that the often human-intensive task of finding labelled examples is completely eliminated. Unsupervised also means that one can model a collection of documents through topics without being a domain expert -in fact one can even model a collection of documents through topics in other languages, without needing to know much about the language (Newman et al 2006). Up to 7482 different messages were considered in this study.…”
Section: Debian Linux Port To Arm Architecturementioning
confidence: 99%
“…After pre-processing the content of 12,022 posts, we have extracted an initial vocabulary of more than 120,000 words. The final length of the vocabulary (W ¼ 410) has been selected considering the occurrence frequency rate (Ng et al 2001).…”
Section: Debian Linux Port To Arm Architecturementioning
confidence: 99%
See 1 more Smart Citation