BIBI System Description: Building with CNNs and Breaking with Deep
            Reinforcement Learning

Li, Yitong; Cohn, Trevor; Baldwin, Timothy

doi:10.18653/v1/w17-5404

Cited by 2 publications

(4 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Challenge sets have been used for several tasks (Li et al (2017); McCoy and Linzen (2019); Ravichander et al (2021), inter alia) to investigate the behaviour of these tasks under a specific phenomenon rather than the standard test distribution (Popović and Castilho, 2019). Lately, with the success of neural metrics, the development of challenge sets for MT evaluation has promoted great interest in studying the strengths and weaknesses of these metrics.…”

Section: Related Workmentioning

confidence: 99%

“…Challenge sets exist for a range of natural language processing (NLP) tasks including Sentiment Analysis (Li et al, 2017;Mahler et al, 2017;Staliūnaitė and Bonfil, 2017), Natural Language Inference (McCoy and Linzen, 2019;Rocchietti et al, 2021), Question Answering (Ravichander * Equal contribution by all authors. et al, 2021), Machine Reading Comprehension (Khashabi et al, 2018), Machine Translation (MT) (King and Falkedal, 1990;Isabelle et al, 2017), and the more specific task of pronoun translation in MT (Guillou and Hardmeier, 2016).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

ACES: Translation Accuracy Challenge Sets at WMT 2023

Amrhein,

Moghe,

Guillou

2023

Proceedings of the Eighth Conference on Machine Translation

View full text Add to dashboard Cite

We benchmark the performance of segmentlevel metrics submitted to WMT 2023 using the ACES Challenge Set (Amrhein et al., 2022). The challenge set consists of 36K examples representing challenges from 68 phenomena and covering 146 language pairs. The phenomena range from simple perturbations at the word/character level to more complex errors based on discourse and real-world knowledge. For each metric, we provide a detailed profile of performance over a range of error categories as well as an overall ACES-Score for quick comparison. We also measure the incremental performance of the metrics submitted to both WMT 2023 and 2022. We find that 1) there is no clear winner among the metrics submitted to WMT 2023, and 2) performance change between the 2023 and 2022 versions of the metrics is highly variable. Our recommendations are similar to those from WMT 2022. Metric developers should focus on: building ensembles of metrics from different design families, developing metrics that pay more attention to the source and rely less on surface-level overlap, and carefully determining the influence of multilingual embeddings on MT evaluation.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

ACES: Translation Accuracy Challenge Sets at WMT 2023

Amrhein,

Moghe,

Guillou

2023

Proceedings of the Eighth Conference on Machine Translation

View full text Add to dashboard Cite

show abstract

“…The builder team from University of Melbourne (which also participated as a breaker team), contributed two sentiment analysis systems consisting of convolutional neural networks. One CNN was trained on data labeled at the phrase level (PCNN), and the other was trained on data labeled at the sentence level (SCNN) (Li et al, 2017).…”

Section: University Of Melbourne Cnnsmentioning

confidence: 99%

“…The breaker team from University of Melbourne opted to generate test minimal pairs automatically, borrowing from methods for generating adversarial examples in computer vision. They used reinforcement learning, optimizing on reversed labels, to identify tokens or phrases to be changed, and then applied a substitution method (Li et al, 2017). Some human supervision was used to ensure grammaticality and correct labeling of the sentences.…”

Section: University Of Melbournementioning

confidence: 99%

Towards Linguistically Generalizable NLP Systems: A Workshop and Shared Task

Ettinger¹,

Rao²,

Daumé³

et al. 2017

Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems

View full text Add to dashboard Cite

This paper presents a summary of the first Workshop on Building Linguistically Generalizable Natural Language Processing Systems, and the associated Build It Break It, The Language Edition shared task. The goal of this workshop was to bring together researchers in NLP and linguistics with a shared task aimed at testing the generalizability of NLP systems beyond the distributions of their training data. We describe the motivation, setup, and participation of the shared task, provide discussion of some highlighted results, and discuss lessons learned.

show abstract

BIBI System Description: Building with CNNs and Breaking with Deep Reinforcement Learning

Cited by 2 publications

References 11 publications

ACES: Translation Accuracy Challenge Sets at WMT 2023

ACES: Translation Accuracy Challenge Sets at WMT 2023

Towards Linguistically Generalizable NLP Systems: A Workshop and Shared Task

Contact Info

Product

Resources

About