Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2009
DOI: 10.1145/1557019.1557044
|View full text |Cite
|
Sign up to set email alerts
|

Connections between the lines

Abstract: Network data is ubiquitous, encoding collections of relationships between entities such as people, places, genes, or corporations. While many resources for networks of interesting entities are emerging, most of these can only annotate connections in a limited fashion. Although relationships between entities are rich, it is impractical to manually devise complete characterizations of these relationships for every pair of entities on large, real-world corpora.In this paper we present a novel probabilistic topic … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
72
0
2

Year Published

2017
2017
2023
2023

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 193 publications
(151 citation statements)
references
References 25 publications
0
72
0
2
Order By: Relevance
“…We note that an LDA model uses randomness in its training and inference, therefore training a new model with the same parameters will always yield slightly different topic distributions and there are different ways and limitations to analyze those [ 35 ].…”
Section: To What Extent Is the War Figurative Frame Used To Talk Aboumentioning
confidence: 99%
“…We note that an LDA model uses randomness in its training and inference, therefore training a new model with the same parameters will always yield slightly different topic distributions and there are different ways and limitations to analyze those [ 35 ].…”
Section: To What Extent Is the War Figurative Frame Used To Talk Aboumentioning
confidence: 99%
“…We assessed success in our case study in three ways: (1) by the effectiveness of the process in leading non-experts to drill down to highly-relevant content in a very large collection of books; (2) by the ability of this process to spotlight a somewhat forgotten woman scientist who is important to the history of psychology; (3) by the capacity of the process to lead domain experts to a surprising discovery about the breadth of species discussed in these historical materials, thus enriching the historical context for current discussions of intelligence in microscopic organisms [ 34 , 35 ]. Our assessments are qualitative rather than quantitative in nature, but they are appropriate given current limitations in quantitative assessments of the quality of topic models [ 36 38 ].…”
Section: Introductionmentioning
confidence: 99%
“…Previous studies indicate a general consensus that human judgments about what makes a “good” topic are generally convergent. However, human judgment does not typically correlate well with quantitative measures of model fit [ 36 ], suggesting that people are interpreting the topics using as-yet poorly understood semantic criteria. Furthermore, variation among people in their interpretation of topic quality may be dependent upon expertise.…”
Section: Introductionmentioning
confidence: 99%
“…With regard to the second class of models for networks extracted from text, Named Entity Recognition methods are typically used to identify the nodes and co-occurrence (or other language analysis approaches) to create edges among them [48,24]. In this case, the output network connects different portions of a text document, or concepts extracted from the text.…”
Section: Text and Topologymentioning
confidence: 99%
“…For example, HINs have been used in the past to model co-occurrence relations between entities (e.g., famous characters, sports, companies) in Wikipedia articles [26]. In [24] vertices represent either famous characters from the text or bags of words, while the edges connect words that best explain the contexts where two or more famous characters appear together in the text. Document-phrase graphs as defined in [23] are also HIN-based models, and more in detail probabilistic bipartite networks B = (V, U, E, W ) where the vertices in one partition V represent documents from a large document collection, the vertices in U represent salient phrases which are semantically relevant to one or more documents in V , and edges E indicate the relevance of each sentence for each document.…”
Section: Text and Topologymentioning
confidence: 99%