N-Gram Similarity and Distance

Kondrak, Grzegorz

doi:10.1007/11575832_13

Cited by 214 publications

(109 citation statements)

References 7 publications

Supporting

Mentioning

101

Contrasting

Unclassified

Order By: Relevance

“…Damerau-Levenshtein [7], [8], Needleman-Wunsch [9], Longest Common Subsequence [10]. Smith-Waterman [11], Jaro [12], JaroWinkler [13], and N-gram [14], [15]. Character-based measure is useful for recognizing typographical errors, but it is useless in recognition of the rearranged terms (e.g.…”

Section: Text Similarity Algorithmsmentioning

confidence: 99%

The performance of text similarity algorithms

Prasetya

Wibawa

Hirashima

2018

Int. J. Adv. Intell. Informatics

View full text Add to dashboard Cite

Text similarity measurement compares text with available references to indicate the degree of similarity between those objects. There have been many studies of text similarity and resulting in various approaches and algorithms. This paper investigates four majors text similarity measurements which include String-based, Corpus-based, Knowledge-based, and Hybrid similarities. The results of the investigation showed that the semantic similarity approach is more rational in finding substantial relationship between texts.

show abstract

Section: Text Similarity Algorithmsmentioning

confidence: 99%

The performance of text similarity algorithms

Prasetya

Wibawa

Hirashima

2018

Int. J. Adv. Intell. Informatics

View full text Add to dashboard Cite

show abstract

“…EFL learners often fail to recognise such words as cognates to the effect that they remain as unpredictable as any other non-cognate word (Nagy et.al., 1993). Orthographic similarity was checked when in doubt with the BI-SIM string comparison method (Kondrak, 2005) using the web interface designed by Bhargava at http://www.cs.toronto.edu/~aditya/ strcmp2/. This method involves a comparison of all pairs of adjacent letters (bi-gram comparisons) in two orthographic strings (an English word and its Turkish equivalent in this case).…”

Section: Cognatesmentioning

confidence: 99%

An English word list for Turkish academics = Türk akademisyenler için İngilizce kelime listesi

Öztürk¹

2018

Dil Dergisi

View full text Add to dashboard Cite

show abstract

“…There are several string-based techniques that could be applied for phonetic transcriptions similarity matching: Edit Distance -finds how dissimilar two strings are by counting the minimum number of operations required to transform one string into another; Jaro-Winkler measure (Winkler, 1999), N-gram similarity function (Kondrak, 2005), Soundex (Russell and Odell, 1918) -phonetic similarity measure, which principle of operation is based on the partition of consonants in the group with serial numbers from which then compiled the resulting value; Daitch-Mokotoff (Mokotoff, 1997) has much more complex conversion rules than in Soundex -now shaping the resulting code involved not only single characters, but also a sequence of several characters; Metaphone -transforms the original word with the rules of English language, using much more complex rules, and thus lost significantly less information as letters are not divided into groups (Euzenat and Shvaiko, 2013). In our solution we allow utilisation of several measuring functions with further weighted aggregation of the results (e.g., weighted product or weighted sum).…”

Section: Phonetic Similaritymentioning

confidence: 99%

Adaptive Vocabulary Learning Environment for Late Talkers

Gavriushenko

Khriyenko

Porokuokka

2016

Proceedings of the 8th International Conference on Computer Supported Education

View full text Add to dashboard Cite

Abstract:The main aim of this research is to provide children who have an early language delay with an adaptive way to train their vocabulary taking into account individuality of the learner. The suggested system is a mobile game-based learning environment which provides simple tasks where the learner chooses a picture that corresponds to a played back sound from multiple pictures presented on the screen. Our basic assumption is that the more similar the concepts (in our case, words) are, the harder the recognition task is. The system chooses the pictures to be presented on the screen by calculating the distances between the concepts in different dimensions. The distances are considered to consist of semantic, visual and auditory similarities. Each similarity factor can be measured with different methods. According to the user's feedback, the weights of the factors and similarity distance are adjusted to modify the level of difficulty in further iterations. The system is designed to attempt to retrieve knowledge about the learners by recognition of aspects that are difficult for them. Proposed solution could be considered as a self-adaptive system, which is trying to recognize individual model of the learner and apply it for further facilitation of his/her learning process. The use of the system will be demonstrated in future work.

show abstract

N-Gram Similarity and Distance

Cited by 214 publications

References 7 publications

The performance of text similarity algorithms

The performance of text similarity algorithms

An English word list for Turkish academics = Türk akademisyenler için İngilizce kelime listesi

Adaptive Vocabulary Learning Environment for Late Talkers

Contact Info

Product

Resources

About