William B. Dolan scite author profile

This paper presents a pilot study of the use of phrasal Statistical Machine Translation (SMT) techniques to identify and correct writing errors made by learners of English as a Second Language (ESL). Using examples of mass noun errors found in the Chinese Learner Error Corpus (CLEC) to guide creation of an engineered training set, we show that application of the SMT paradigm can capture errors not well addressed by widely-used proofing tools designed for native speakers. Our system was able to correct 61.81% of mistakes in a set of naturallyoccurring examples of mass noun errors found on the World Wide Web, suggesting that efforts to collect alignable corpora of pre-and post-editing ESL writing samples offer can enable the development of SMT-based writing assistance tools capable of repairing many of the complex syntactic and lexical problems found in the writing of ESL learners.

show abstract

Extracting Lexically Divergent Paraphrases from Twitter

Ritter

Callison-Burch

et al. 2014

TACL

View full text Add to dashboard Cite

We present MultiP (Multi-instance Learning Paraphrase Model), a new model suited to identify paraphrases within the short messages on Twitter. We jointly model paraphrase relations between word and sentence pairs and assume only sentence-level annotations during learning. Using this principled latent variable model alone, we achieve the performance competitive with a state-of-the-art method which combines a latent space model with a feature-based supervised classifier. Our model also captures lexically divergent paraphrases that differ from yet complement previous methods; combining our model with previous work significantly outperforms the state-of-the-art. In addition, we present a novel annotation methodology that has allowed us to crowdsource a paraphrase corpus from Twitter. We make this new dataset available to the research community.

show abstract

What Syntax Can Contribute in the Entailment Task

Vanderwende

Dolan

2006

View full text Add to dashboard Cite

This paper describes the PASCAL Network of Excellence Recognising Textual Entailment (RTE) Challenge benchmark 1 . The RTE task is defined as recognizing, given two text fragments, whether the meaning of one text can be inferred (entailed) from the other. This applicationindependent task is suggested as capturing major inferences about the variability of semantic expression which are commonly needed across multiple applications. The Challenge has raised noticeable attention in the research community, attracting 17 submissions from diverse groups, suggesting the generic relevance of the task.

show abstract

Using Statistical Techniques and Web Search to Correct ESL Errors

Gamon¹,

Leacock²,

Brockett³

et al. 2013

CALICO

View full text Add to dashboard Cite

In this paper we present a system for automatic correction of errors made by learners of English. The system has two novel aspects. First, machine-learned classifiers trained on large amounts of native data and a very large language model are combined to optimize the precision of suggested corrections. Second, the user can access real-life web examples of both their original formulation and the suggested correction. We discuss technical details of the system, including the choice of classifier, feature sets, and language model. We also present results from an evaluation of the system on a set of corpora. We perform an automatic evaluation on native English data and a detailed manual analysis of performance on three corpora of nonnative writing: the Chinese Learners' of English Corpus (CLEC) and two corpora of web and email writing.

show abstract

Overcoming the customization bottleneck using example-based MT

Richardson

Dolan

Menezes

et al. 2001

View full text Add to dashboard Cite

We describe MSR-MT, a large-scale hybrid machine translation system under development for several language pairs. This system's ability to acquire its primary translation knowledge automatically by parsing a bilingual corpus of hundreds of thousands of sentence pairs and aligning resulting logical forms demonstrates true promise for overcoming the so-called MT customization bottleneck. Trained on English and Spanish technical prose, a blind evaluation shows that MSR-MT's integration of rule-based parsers, example based processing, and statistical techniques produces translations whose quality exceeds that of uncustomized commercial MT systems in this domain.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

William B. Dolan

Correcting ESL errors using phrasal SMT techniques

Extracting Lexically Divergent Paraphrases from Twitter

What Syntax Can Contribute in the Entailment Task

Using Statistical Techniques and Web Search to Correct ESL Errors

Overcoming the customization bottleneck using example-based MT

Contact Info

Product

Resources

About