2014
DOI: 10.1007/978-3-319-10085-2_6
|View full text |Cite
|
Sign up to set email alerts
|

Fast Phonetic Similarity Search over Large Repositories

Abstract: Analysis of unstructured data may be inefficient in the presence of spelling errors. Existing approaches use string similarity methods to search for valid words within a text, with a supporting dictionary. However, they are not rich enough to encode phonetic information to assist the search. In this paper, we present a novel approach for efficiently perform phonetic similarity search over large data sources, that uses a data structure called PhoneticMap to encode language-specific phonetic information. We vali… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
3
1

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(6 citation statements)
references
References 12 publications
0
6
0
Order By: Relevance
“…However, to avoid any information loss, the parameters t and e are the original input token and entry, not the phonetic representation used along the previous step. The approach does not define a new similarity function, but it uses existing ones, such as the Jaro-Winkler or String Sim [13] metrics. The result of the similarity function is used in two filtering rules.…”
Section: Filtering the Resultsmentioning
confidence: 99%
“…However, to avoid any information loss, the parameters t and e are the original input token and entry, not the phonetic representation used along the previous step. The approach does not define a new similarity function, but it uses existing ones, such as the Jaro-Winkler or String Sim [13] metrics. The result of the similarity function is used in two filtering rules.…”
Section: Filtering the Resultsmentioning
confidence: 99%
“…One of the earliest systems for calculating phonetic similarity is Soundex, first used to classify and disambiguate personal names in studies of the United States Census in the 1930s (Stephenson 1974). Soundex-like systems that calculate phonetic similarity based on orthography alone are used in informatics applications such as information retrieval and spell-check (Philips 1990;Tissot, Peschl, and Fabro 2014). Approaches to quantifying phonetic similarity that specifically involve phonetic features have been used by linguists to study synchronic language variation (Ladefoged 1969), diachronic language change (Nerbonne 2010), and sound patterning in phonology (Mielke 2012).…”
Section: Background and Related Researchmentioning
confidence: 99%
“…ED operates between two input strings – ED ( w 1 , w 2 ) – and returns the minimum number of operations (single-character edits) required to transform string w 1 into w 2 . Other examples and variations of string similarity metrics include Jaro-Winkler Distance [9], Hamming Distance [13], and String Sim [14]. However, string distance measures tend to ignore the relative likelihood errors.…”
Section: Approximate String Matchmentioning
confidence: 99%
“…Soundex [15] is an example of a phonetic matching scheme initially designed for English that uses codes based on the sound of each letter to translate a string into a canonical form of at most four characters, preserving the first letter. In addition, phonetic similarity metrics are able to assign a high score even though comparing dissimilar pairs of strings that produce similar sounds [14, 16]. As the result, phonetically similar entries will have the same (or similar) keys and they can be indexed for efficient search using some hashing method.…”
Section: Approximate String Matchmentioning
confidence: 99%
See 1 more Smart Citation