2010
DOI: 10.1002/asi.21361
|View full text |Cite
|
Sign up to set email alerts
|

Human assessments of document similarity

Abstract: Two studies are reported that examined the reliability of human assessments of document similarity and the association between human ratings and the results of n-gram automatic text analysis (ATA). Human inter-assessor reliability (IAR) was moderate to poor. However, correlations between average human ratings and n-gram solutions were strong. The average correlation between ATA and individual human solutions was greater than IAR. N-gram length influenced the strength of association, but optimum string length d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2011
2011
2012
2012

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 22 publications
0
3
0
Order By: Relevance
“…Given appropriate term selection and weighting, the BOW model can produce cognitively plausible results (Westerman et al, 2010;Lee et al, 2005). However, the validity of resultant similarities can be compromised by vocabulary mismatch, which is the tendency for different words to be used to express the same concepts (Furnas, Landauer, Gomez, & Dumais, 1987).As terms are deemed independent, the semantic relationship between documents using synonymous or semantically related terms will remain obscured, wrongly suppressing similarity measures.…”
Section: B Bow = a T · Amentioning
confidence: 99%
See 1 more Smart Citation
“…Given appropriate term selection and weighting, the BOW model can produce cognitively plausible results (Westerman et al, 2010;Lee et al, 2005). However, the validity of resultant similarities can be compromised by vocabulary mismatch, which is the tendency for different words to be used to express the same concepts (Furnas, Landauer, Gomez, & Dumais, 1987).As terms are deemed independent, the semantic relationship between documents using synonymous or semantically related terms will remain obscured, wrongly suppressing similarity measures.…”
Section: B Bow = a T · Amentioning
confidence: 99%
“…Given appropriate term selection and weighting, the BOW model can produce cognitively plausible results (Westerman et al, 2010; Lee et al, 2005). However, the validity of resultant similarities can be compromised by vocabulary mismatch , which is the tendency for different words to be used to express the same concepts (Furnas, Landauer, Gomez, & Dumais, 1987).…”
Section: Discovering Latent Semantic Structurementioning
confidence: 99%
“…None of the bag‐of‐words similarity measures approach this level. Furthermore, research has shown that estimates of interrater consistency based on partial document sets can be overoptimistic (Westerman, Cribbin, & Collins, ), which bolsters our method's performance.…”
Section: Evaluation Against Human Judgmentsmentioning
confidence: 99%