2014
DOI: 10.1016/j.langsci.2014.06.008
|View full text |Cite
|
Sign up to set email alerts
|

Phonotactics in morphological similarity metrics

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2014
2014
2020
2020

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 9 publications
0
5
0
Order By: Relevance
“…We approach this problem by sorting the list of possible base words for a lexeme from the most plausible ones to the least probable ones. Because sorting a whole lexicon for each lexeme is infeasible, we restrict the list of possible base words to the 100 most similar words according to Proxinette measure (Hathout, 2009;Hathout, 2014). Such constructed candidate list is sorted by a ranker, which is previously trained on a relatively small training set provided by a linguist.…”
Section: Machine Learning Approach: Polish and Spanishmentioning
confidence: 99%
See 1 more Smart Citation
“…We approach this problem by sorting the list of possible base words for a lexeme from the most plausible ones to the least probable ones. Because sorting a whole lexicon for each lexeme is infeasible, we restrict the list of possible base words to the 100 most similar words according to Proxinette measure (Hathout, 2009;Hathout, 2014). Such constructed candidate list is sorted by a ranker, which is previously trained on a relatively small training set provided by a linguist.…”
Section: Machine Learning Approach: Polish and Spanishmentioning
confidence: 99%
“…In order to avoid the consideration of all possible parents for each lexeme, only a candidate set of most morphologically similar words is considered. As a measure of morphological similarity, the Proxinette distance (Hathout, 2009;Hathout, 2014) is used. Next, the previously trained ranker is applied to order each candidate set.…”
Section: Introductionmentioning
confidence: 99%
“…Commonly employed measures are as follows: sequence matcher ratio, Damerau–Levenshtein distance, normalized Damerau–Levenshtein distance, Jaccard distance, Masi distance, and Jaro–Winkler similarity distance. These measures have many applications in language research, biology (such as in DNA and RNA analysis), and data mining ( Bisani & Ney, 2008 ; Damper & Eastmond, 1997 ; Ferragne & Pellegrino, 2010 ; Gillot et al, 2010 ; Hathout, 2014 ; Heeringa et al, 2009 ; Hixon et al, 2011 ; Jelinek, 1996 ; Kaiser et al, 2002 ; Navarro, 2001 ; Peng et al, 2011 ; Riches et al, 2011 ; Schlippe et al, 2010 ; Schlüter et al, 2010 ; Spruit et al, 2009 ; Tang & van Heuven, 2009 ; Wieling et al, 2012 ). In a study by Smith et al (2019) , the phonemic edit distance ratio, which is an automatic distance function, was employed to estimate error frequency analysis for evaluating the speech production of individuals with acquired language disorders, such as apraxia of speech and aphasia with phonemic paraphasia, highlighting the efficacy of distance metrics in automating manual measures in the context of language pathology.…”
Section: Alternative Approaches Using “Distance Functions”mentioning
confidence: 99%
“…For normalizing Twitter data, Jin uses bigrams, skip-1-bigrams and sets the weight for each feature to 1. Barteld et al (2015) use yet another similarity measure, Proxinette (Hathout, 2014), for spelling variant detection. Similar to the Jaccard Index, Proxinette uses similarity features to compute the similarity of two types.…”
Section: Dealing With Spelling Variationmentioning
confidence: 99%