Reading metrics for estimating task efficiency with MT output

Klerke, Sigrid; Castilho, Sheila; Barrett, Maria; Søgaard, Anders

doi:10.18653/v1/w15-2402

Cited by 17 publications

(19 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Direct assessments of adequacy and MT ranking are the official evaluation procedure for the most recent WMT translation shared task campaigns (Bojar et al, 2016(Bojar et al, , 2017. Other researchers use post-task questionnaires (Stymne et al, 2012;Doherty and O'Brien, 2014;Klerke et al, 2015;Castilho and O'Brien, 2016) to assess the perceived usefulness of MT output. Direct assessment, ranking or post-task questionnaire evaluation methods are clearly subjective and require informants to make "in vitro" judgements about the quality of MT outputs, without considering their usefulness for a specific "in vivo", real-world application.…”

Section: Evaluation Of Mt For Gistingmentioning

confidence: 99%

Exploring gap filling as a cheaper alternative to reading comprehension questionnaires when evaluating machine translation for gisting

Forcada

Scarton

Specia

et al. 2018

Proceedings of the Third Conference on Machine Translation: Research Papers

View full text Add to dashboard Cite

A popular application of machine translation (MT) is gisting: MT is consumed as is to make sense of text in a foreign language. Evaluation of the usefulness of MT for gisting is surprisingly uncommon. The classical method uses reading comprehension questionnaires (RCQ), in which informants are asked to answer professionally-written questions in their language about a foreign text that has been machine-translated into their language. Recently, gap-filling (GF), a form of cloze testing, has been proposed as a cheaper alternative to RCQ. In GF, certain words are removed from reference translations and readers are asked to fill the gaps left using the machine-translated text as a hint. This paper reports, for the first time, a comparative evaluation, using both RCQ and GF, of translations from multiple MT systems for the same foreign texts, and a systematic study on the effect of variables such as gap density, gap-selection strategies, and document context in GF. The main findings of the study are: (a) both RCQ and GF clearly identify MT to be useful; (b) global RCQ and GF rankings for the MT systems are mostly in agreement; (c) GF scores vary very widely across informants, making comparisons among MT systems hard, and (d) unlike RCQ, which is framed around documents, GF evaluation can be framed at the sentence level. These findings support the use of GF as a cheaper alternative to RCQ.

show abstract

Section: Evaluation Of Mt For Gistingmentioning

confidence: 99%

Exploring gap filling as a cheaper alternative to reading comprehension questionnaires when evaluating machine translation for gisting

Forcada

Scarton

Specia

et al. 2018

Proceedings of the Third Conference on Machine Translation: Research Papers

View full text Add to dashboard Cite

show abstract

“…These two observations recently lead Klerke et al (2015) to suggest using eye-tracking measures as metrics in text simplification. We go beyond this by suggesting that eye-tracking recordings can be used to induce better models for sentence compression for text simplification.…”

Section: Introductionmentioning

confidence: 96%

“…Sentence compression is a basic operation in text simplification which has the potential to improve statistical machine translation and automatic summarization (Berg-Kirkpatrick et al, 2011;Klerke et al, 2015), as well as helping poor readers in need of assistive technologies (Canning et al, 2000). This work suggests using eye-tracking recordings for improving sentence compression for text simplification systems and is motivated by two observations: (i) Sentence compression is the task of automatically making sentences easier to process by shortening them.…”

Section: Introductionmentioning

confidence: 99%

Improving sentence compression by learning to predict gaze

Klerke¹,

Goldberg²,

Søgaard³

2016

Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

Self Cite

View full text Add to dashboard Cite

show abstract

“…This relationship between text and eye movements, has led to an influx of studies investigating the use of eye tracking data to improve and test computational models of language i.e. Barrett et al (2016); Demberg and Keller (2008); Klerke et al (2015). In this study we aim to incorporate eye movement data for the task of automatic readability assessment.…”

Section: Introductionmentioning

confidence: 99%

“…It can also be used to assess the performance of machine translation, text simplification and language generation systems. Eye-tracking data has previously been used to evaluate readability models (Green, 2014;Klerke et al, 2015), however, our main contribution is to explore the way that eye tracking data can help improve models for readability assessment through multi-task learning (Caruana, 1997) and parser metrics based on the surprisal theory of syntactic complexity (Hale, 2001(Hale, , 2016. Multi task learning allows the model to learn various tasks in parallel and improve performance by sharing parameters in the hidden layers.…”

Section: Introductionmentioning

confidence: 99%

Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

Tetreault

Burstein

Leacock³

et al. 2017

View full text Add to dashboard Cite

Since the first workshop in 1997, BEA has become the leading venue for sharing and publishing innovative work that uses NLP to develop educational applications. The consistent interest and growth of the workshop has clear ties to challenges in education. The research presented at the workshop highlights advances in the technology and the maturity of the field of NLP in education. The capabilities serve as a response to educational challenges and are poised to support the needs of a variety of stakeholders, including educators, learners, parents, and administrators.NLP capabilities now support an array of learning domains, including writing, speaking, reading, and mathematics. In the writing and speech domains, automated writing evaluation (AWE) and speech assessment applications, respectively, are commercially deployed in high-stakes assessment and instructional settings, including Massive Open Online Courses (MOOCs). We also see widelyused commercial applications for plagiarism detection and peer review and explosive growth of mobile applications for game-based applications for instruction and assessment. The current educational and assessment landscape continues to foster a strong interest and high demand that pushes the state of the art in AWE capabilities to expand the analysis of written responses to writing genres other than those traditionally found in standardized assessments, especially writing tasks requiring use of sources and argumentative discourse.Steady growth in the development of NLP-based applications for education has prompted an increased number of workshops that typically focus on a single subfield. In BEA, we make an effort to have papers from many subfields, for example, tools for automated scoring, automated test-item generation, curriculum development, evaluation of text, dialogue, evaluation of genres beyond essays, feedback studies, and grammatical error correction.This year we received a record 62 submissions, and accepted 9 papers as oral presentations and 25 as poster presentation and/or demos, for an overall acceptance rate of 55 percent. Each paper was reviewed by three members of the Program Committee who were believed to be most appropriate for each paper. We continue to have a very strong policy to deal with conflicts of interest. First, we made a concerted effort to not assign papers to reviewers to evaluate if the paper had an author from their institution. Second, with respect to the organizing committee, authors of papers for which there was a conflict of interest recused themselves from the discussions.While the field is growing, we do recognize that there is a core group of institutions and researchers who work in this area. With a higher acceptance rate, we were able to include papers from a wider variety of topics and institutions. The papers accepted were selected on the basis of several factors, including the relevance to a core educational problem space, the novelty of the approach or domain, and the strength of the research. The accepted papers were highly diverse -...

show abstract

Reading metrics for estimating task efficiency with MT output

Cited by 17 publications

References 6 publications

Exploring gap filling as a cheaper alternative to reading comprehension questionnaires when evaluating machine translation for gisting

Exploring gap filling as a cheaper alternative to reading comprehension questionnaires when evaluating machine translation for gisting

Improving sentence compression by learning to predict gaze

Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

Contact Info

Product

Resources

About