2014
DOI: 10.1002/asi.23240
|View full text |Cite
|
Sign up to set email alerts
|

On searching misspelled collections

Abstract: We describe an unsupervised, language-independent spelling correction search system. We compare the proposed approach with unsupervised and supervised algorithms. The described approach consistently outperforms other unsupervised efforts and nearly matches the performance of a current state-of-the-art supervised approach.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2016
2016
2016
2016

Publication Types

Select...
1
1

Relationship

2
0

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 9 publications
(11 reference statements)
0
3
0
Order By: Relevance
“…For completeness, Segments is a system that takes an input string, and using 6 substring rules, returns a list of possible correction candidates derived from a lexicon, ranked by similarity. A detailed Segments description is found in [21,22,20,23]. Recent research has reaffirmed the potential of segmenting strings by using said segments to perform authorship attribution [19].…”
Section: Segmentsmentioning
confidence: 99%
“…For completeness, Segments is a system that takes an input string, and using 6 substring rules, returns a list of possible correction candidates derived from a lexicon, ranked by similarity. A detailed Segments description is found in [21,22,20,23]. Recent research has reaffirmed the potential of segmenting strings by using said segments to perform authorship attribution [19].…”
Section: Segmentsmentioning
confidence: 99%
“…UNLV [25], IMPACT [26]), but these datasets are not applicable to our work, as these datasets do not provide means to accurately evaluate our system; namely, they are lacking query relevance (qrel) judgments. Without those, we would only be measuring the correction accuracy of Segments, which has already been exhaustively studied in prior papers using heterogeneous datasets [27], [17], [18]. Therefore, despite the age of the TREC collection, it remains the only collection that provides ground truth, corrupted text, and 3rd party qrel judgments, in a publicly available package.…”
Section: B Limitationsmentioning
confidence: 99%
“…Over the past years, we evaluated methods for reliably correcting phase one errors via post-processing using our method called Segments [17], [18], [19]. Segments differs from previous research in that it is an unsupervised approach, which makes minimal assumptions about resource availability, and has no dependence on language within the algorithm.…”
Section: Introductionmentioning
confidence: 99%