2016
DOI: 10.1145/2890510
|View full text |Cite
|
Sign up to set email alerts
|

Measuring Similarity Similarly

Abstract: Several intelligent technologies designed to improve navigability in and digestibility of text corpora use topic modeling such as the state-of-the-art Latent Dirichlet Allocation (LDA). This model and variants on it provide lower-dimensional document representations used in visualizations and in computing similarity between documents. This article contributes a method for validating such algorithms against human perceptions of similarity, especially applicable to contexts in which the algorithm is intended to … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
3
3
1

Relationship

1
6

Authors

Journals

citations
Cited by 26 publications
(8 citation statements)
references
References 44 publications
0
8
0
Order By: Relevance
“…We also restricted participation to those who had at least 500 assignments approved by other requesters and a 95% overall approval rating. (This is the same as in [27]). In the main experiment, we also excluded from analysis two participants who wrote keyboard-mashing strings in unselected "other" boxes on demographic questions, and two participants who took steps to defeat the participation limits.…”
Section: Participants and Filtersmentioning
confidence: 81%
See 3 more Smart Citations
“…We also restricted participation to those who had at least 500 assignments approved by other requesters and a 95% overall approval rating. (This is the same as in [27]). In the main experiment, we also excluded from analysis two participants who wrote keyboard-mashing strings in unselected "other" boxes on demographic questions, and two participants who took steps to defeat the participation limits.…”
Section: Participants and Filtersmentioning
confidence: 81%
“…In order to better cover the space of available proposals, we used a constraint satisfaction solver to maximize diversity by selecting the set of four focal proposals by four different authors that were on average maximally different from each other according to a previously studied LDA/cosine similarity measure [27] (the same as used in "Selecting topically related proposals" below). The solver we used was Excel 2013's "Evolutionary" solver, which produced better results than its "GRG nonlinear" solver.…”
Section: Selecting Focal Proposalsmentioning
confidence: 99%
See 2 more Smart Citations
“…Experiments use JS divergence as an informationtheoretically motivated metric in the probabilistic space created by topic models. Since it is a smoothed and symmetric alternative to the KL divergence, which is a standard measure for comparing distributions [39], it has been extensively used as state-of-the-art metric over topic distributions in literature [1,31,38]. Our upper bound is created from the brute-force comparison of the reference documents with all documents in the collection to obtain the list of similar documents.…”
Section: Datasets and Evaluation Metricsmentioning
confidence: 99%