1998
DOI: 10.1108/eum0000000007161
|View full text |Cite
|
Sign up to set email alerts
|

Applications of n‐grams in textual information systems

Abstract: This paper provides an introduction to the use of n-grams in textual information systems, where an n-gram is a string of n, usually adjacent, characters extracted from a section of continuous text. Applications that can be implemented efficiently and effectively using sets of ngrams include spelling error detection and correction, query expansion, information retrieval with serial, inverted and signature files, dictionary look-up, text compression, and language identification.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
36
0
3

Year Published

2002
2002
2015
2015

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 70 publications
(40 citation statements)
references
References 90 publications
0
36
0
3
Order By: Relevance
“…We can consider two bases for the characterisation and manipulation of text (Robertson and Willett, 1998): on the one hand, the individual characters that form the basis for the byte-level operations available to computers, and on the other, the individual words that are used by people -in this work represented by the spelling correction approaches previously discussed. These basic units can then be assembled into larger text segments such as sentences, paragraphs, etc.…”
Section: The N-gram Based Approachmentioning
confidence: 99%
See 2 more Smart Citations
“…We can consider two bases for the characterisation and manipulation of text (Robertson and Willett, 1998): on the one hand, the individual characters that form the basis for the byte-level operations available to computers, and on the other, the individual words that are used by people -in this work represented by the spelling correction approaches previously discussed. These basic units can then be assembled into larger text segments such as sentences, paragraphs, etc.…”
Section: The N-gram Based Approachmentioning
confidence: 99%
“…Formally, an n-gram is a sub-sequence of n characters from a given word (Robertson and Willett, 1998). So, for example, we can split the word "potato" into four overlapping character 3-grams: -pot-, -ota-, -tat-and -ato-.…”
Section: The N-gram Based Approachmentioning
confidence: 99%
See 1 more Smart Citation
“…In n-gram matching words are decomposed into n-grams, i.e., into substrings of length n Pfeifer et al, 1996;Pirkola et al, 2002;Robertson and Willett, 1998;Salton, 1989). N-gram matching has been reported to be an effective technique among various approximate matching techniques in name searching (Pfeifer et al, 1996;Zobel and Dart, 1995) and cross-lingual spelling variant matching and is an appropriate fuzzy matching technique for use with TRT.…”
Section: N-gram Matchingmentioning
confidence: 99%
“…Approximate matching techniques involve Soundex and Phonix, which compare words on the basis of their phonetic similarity (Gadd, 1990), edit distance (Zobel and Dart, 1996), and n-gram based matching (Robertson and Willett, 1998). In ngram matching text strings are decomposed into n-grams, i.e., substrings of length n, which usually consist of the adjacent characters of the text strings.…”
Section: Introductionmentioning
confidence: 99%