Micronutrients and women of reproductive potential: required dietary intake and consequences of dietary deficiency or excess. Part I – Folate, Vitamin B12, Vitamin B6

Abstract. In many applications, it is necessary to algorithmically quantify the similarity exhibited by two strings composed of symbols from a finite alphabet. Numerous string similarity measures have been proposed. Particularly well-known measures are based are edit distance and the length of the longest common subsequence. We develop a notion of n-gram similarity and distance. We show that edit distance and the length of the longest common subsequence are special cases of n-gram distance and similarity, respectively. We provide formal, recursive definitions of n-gram similarity and distance, together with efficient algorithms for computing them. We formulate a family of word similarity measures based on n-grams, and report the results of experiments that suggest that the new measures outperform their unigram equivalents.

show abstract

Inflection Generation as Discriminative String Transduction

Nicolai

Cherry

Kondrak

2015

View full text Add to dashboard Cite

We approach the task of morphological inflection generation as discriminative string transduction. Our supervised system learns to generate word-forms from lemmas accompanied by morphological tags, and refines them by referring to the other forms within a paradigm.Results of experiments on six diverse languages with varying amounts of training data demonstrate that our approach improves the state of the art in terms of predicting inflected word-forms.

show abstract

A theoretical evaluation of selected backtracking algorithms

Kondrak

Beek

1997

Artificial Intelligence

107

View full text Add to dashboard Cite

Cognates can improve statistical translation models

Kondrak

Marcu

Knight

2003

View full text Add to dashboard Cite

show abstract

A Comparison of Sentiment Analysis Techniques: Polarizing Movie Blogs

Annett

Kondrak

2008

108

View full text Add to dashboard Cite

With the ever-growing popularity of online media such as blogs and social networking sites, the Internet is a valuable source of information for product and service reviews. Attempting to classify a subset of these documents using polarity metrics can be a daunting task. After a survey of previous research on sentiment polarity, we propose a novel approach based on Support Vector Machines. We compare our method to previously proposed lexical-based and machine learning (ML) approaches by applying it to a publicly available set of movie reviews. Our algorithm will be integrated within a blog visualization tool.

show abstract

Identifying cognates by phonetic and semantic similarity

Kondrak

2001

View full text Add to dashboard Cite

I present a method of identifying cognates in the vocabularies of related languages. I show that a measure of phonetic similarity based on multivalued features performs better than "orthographic" measures, such as the Longest Common Subsequence Ratio (LCSR) or Dice's coefficient. I introduce a procedure for estimating semantic similarity of glosses that employs keyword selection and WordNet. Tests performed on vocabularies of four Algonquian languages indicate that the method is capable of discovering on average nearly 75% percent of cognates at 50% precision.

show abstract

Learning a spelling error model from search query logs

Ahmad

Kondrak

2005

View full text Add to dashboard Cite

Applying the noisy channel model to search query spelling correction requires an error model and a language model. Typically, the error model relies on a weighted string edit distance measure. The weights can be learned from pairs of misspelled words and their corrections. This paper investigates using the Expectation Maximization algorithm to learn edit distance weights directly from search query logs, without relying on a corpus of paired words.

show abstract

On the syllabification of phonemes

Bartlett

Kondrak

Cherry

2009

View full text Add to dashboard Cite

Syllables play an important role in speech synthesis and recognition. We present several different approaches to the syllabification of phonemes. We investigate approaches based on linguistic theories of syllabification, as well as a discriminative learning technique that combines Support Vector Machine and Hidden Markov Model technologies. Our experiments on English, Dutch and German demonstrate that our transparent implementation of the sonority sequencing principle is more accurate than previous implementations, and that our language-independent SVM-based approach advances the current state-of-the-art, achieving word accuracy of over 98% in English and 99% in German and Dutch.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Grzegorz Kondrak

N-Gram Similarity and Distance

Inflection Generation as Discriminative String Transduction

A theoretical evaluation of selected backtracking algorithms

Cognates can improve statistical translation models

A Comparison of Sentiment Analysis Techniques: Polarizing Movie Blogs

Identifying cognates by phonetic and semantic similarity

Learning a spelling error model from search query logs

On the syllabification of phonemes

Contact Info

Product

Resources

About