Genetic basis of resistance to sugarcane mosaic virus in European maize germplasm

We approach the task of morphological inflection generation as discriminative string transduction. Our supervised system learns to generate word-forms from lemmas accompanied by morphological tags, and refines them by referring to the other forms within a paradigm.Results of experiments on six diverse languages with varying amounts of training data demonstrate that our approach improves the state of the art in terms of predicting inflected word-forms.

show abstract

Cross-Linguistic Syntactic Evaluation of Word Prediction Models

Mueller¹,

Nicolai²,

Petrou-Zeniou³

et al. 2020

View full text Add to dashboard Cite

A range of studies have concluded that neural word prediction models can distinguish grammatical from ungrammatical sentences with high accuracy. However, these studies are based primarily on monolingual evidence from English. To investigate how these models' ability to learn syntax varies by language, we introduce CLAMS (Cross-Linguistic Assessment of Models on Syntax), a syntactic evaluation suite for monolingual and multilingual models. CLAMS includes subject-verb agreement challenge sets for English, French, German, Hebrew and Russian, generated from grammars we develop. We use CLAMS to evaluate LSTM language models as well as monolingual and multilingual BERT. Across languages, monolingual LSTMs achieved high accuracy on dependencies without attractors, and generally poor accuracy on agreement across object relative clauses. On other constructions, agreement accuracy was generally higher in languages with richer morphology. Multilingual models generally underperformed monolingual models. Multilingual BERT showed high syntactic accuracy on English, but noticeable deficiencies in other languages.

show abstract

The SIGMORPHON 2019 Shared Task: Morphological Analysis in Context and Cross-Lingual Transfer for Inflection

McCarthy¹,

Vylomova²,

Wu³

et al. 2019

View full text Add to dashboard Cite

The SIGMORPHON 2019 shared task on cross-lingual transfer and contextual analysis in morphology examined transfer learning of inflection between 100 language pairs, as well as contextual lemmatization and morphosyntactic description in 66 languages. The first task evolves past years' inflection tasks by examining transfer of morphological inflection knowledge from a high-resource language to a low-resource language. This year also presents a new second challenge on lemmatization and morphological feature analysis in context. All submissions featured a neural component and built on either this year's strong baselines or highly ranked systems from previous years' shared tasks. Every participating team improved in accuracy over the baselines for the inflection task (though not Levenshtein distance), and every team in the contextual analysis task improved on both state-of-the-art neural and non-neural baselines. Data Data for Task 1Language pairs We presented data in 100 language pairs spanning 79 unique languages. Data for all but four languages (Basque, Kurmanji, Murrinhpatha, and Sorani) are extracted from English Wiktionary, a large multi-lingual crowdsourced dictionary with morphological paradigms

show abstract

Morphological Reinflection via Discriminative String Transduction

Nicolai¹,

Hauer²,

Arnaud³

et al. 2016

View full text Add to dashboard Cite

show abstract

Pattern Classification in No-Limit Poker: A Head-Start Evolutionary Approach

Beattie

Nicolai

Gerhard

et al. 2007

View full text Add to dashboard Cite

Abstract. We have constructed a poker classification system which makes informed betting decisions based upon three defining features extracted while playing poker: hand value, risk, and aggressiveness. The system is implemented as a poker player agent, and as such, the goals of the classifier are not only to correctly determine whether each hand should be folded, called, or raised, but to win as many chips as possible from the other players. The decision space is found by evolutionary methods, starting from a designed initial state. Our results showed that evolving an agent from a data-driven "head-start" position resulted in the best performance over agents evolved from scratch, random agents, data-driven agents, and "always fold" agents (a surprisingly effective strategy).

show abstract

Leveraging Inflection Tables for Stemming and Lemmatization.

Nicolai

Kondrak

2016

View full text Add to dashboard Cite

We present several methods for stemming and lemmatization based on discriminative string transduction. We exploit the paradigmatic regularity of semi-structured inflection tables to identify stems in an unsupervised manner with over 85% accuracy. Experiments on English, Dutch and German show that our stemmers substantially outperform Snowball and Morfessor, and approach the accuracy of a supervised model. Furthermore, the generated stems are more consistent than those annotated by experts. Our direct lemmatization model is more accurate than Morfette and Lemming on most datasets. Finally, we test our methods on the data from the shared task on morphological reinflection.

show abstract

Multiple System Combination for Transliteration

Nicolai

Hauer

Salameh

et al. 2015

View full text Add to dashboard Cite

We report the results of our experiments in the context of the NEWS 2015 Shared Task on Transliteration. We focus on methods of combining multiple base systems, and leveraging transliterations from multiple languages. We show error reductions over the best base system of up to 10% when using supplemental transliterations, and up to 20% when using system combination. We also discuss the quality of the shared task datasets.

show abstract

Bootstrapping Unsupervised Bilingual Lexicon Induction

Hauer

Nicolai

Kondrak

2017

View full text Add to dashboard Cite

The task of unsupervised lexicon induction is to find translation pairs across monolingual corpora. We develop a novel method that creates seed lexicons by identifying cognates in the vocabularies of related languages on the basis of their frequency and lexical similarity. We apply bidirectional bootstrapping to a method which learns a linear mapping between context-based vector spaces. Experimental results on three language pairs show consistent improvement over prior work.

show abstract

12 3 4 5 6

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Garrett Nicolai

Inflection Generation as Discriminative String Transduction

Cross-Linguistic Syntactic Evaluation of Word Prediction Models

The SIGMORPHON 2019 Shared Task: Morphological Analysis in Context and Cross-Lingual Transfer for Inflection

Morphological Reinflection via Discriminative String Transduction

Pattern Classification in No-Limit Poker: A Head-Start Evolutionary Approach

Leveraging Inflection Tables for Stemming and Lemmatization.

Multiple System Combination for Transliteration

Bootstrapping Unsupervised Bilingual Lexicon Induction

Contact Info

Product

Resources

About