Towards Replicability in Parsing

Dakota, Daniel; Kübler, Sandra

doi:10.26615/978-954-452-049-6_026

Cited by 5 publications

(4 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Branco et al (2016), Branco et al (2018) and Branco et al (2020)), encouragingly, there is an increasing awareness around the need for reproducibility in the field (e.g. Dakota andKübler (2017), Wieling et al (2018) and Cohen et al (2018)), while some of the major conferences (e.g. ACL-IJCNLP 22 ) now encourage authors to submit supplementary material to facilitate others in reproducing their results.…”

Section: Discussionmentioning

confidence: 97%

Towards transparency in NLP shared tasks

Escartín,

Lynn,

Moorkens

et al. 2021

Preprint

View full text Add to dashboard Cite

This article reports on a survey carried out across the Natural Language Processing (NLP) community. The survey aimed to capture the opinions of the research community on issues surrounding shared tasks, with respect to both participation and organisation. Amongst the 175 responses received, both positive and negative observations were made. We carried out and report on an extensive analysis of these responses, which leads us to propose a Shared Task Organisation Checklist that could support future participants and organisers.The proposed Checklist is flexible enough to accommodate the wide diversity of shared tasks in our field and its goal is not to be prescriptive, but

show abstract

Section: Discussionmentioning

confidence: 97%

Towards transparency in NLP shared tasks

Escartín,

Lynn,

Moorkens

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…According to Fokkens et al (2013) and Wieling et al (2018), the main challenge is the unavailability of the source code and data. Dakota and Kübler (2017) study reproducibility for text mining. They show that 80% of the failed reproduction attempts were due to the lack of information about the datasets.…”

Section: Reproducibility In Nlpmentioning

confidence: 99%

Reproducibility Issues for BERT-based Evaluation Metrics

Chen¹,

Belouadi²,

Eger³

2022

Preprint

View full text Add to dashboard Cite

Reproducibility is of utmost concern in machine learning and natural language processing (NLP). In the field of natural language generation (especially machine translation), the seminal paper of Post (2018) has pointed out problems of reproducibility of the dominant metric, BLEU, at the time of publication. Nowadays, BERT-based evaluation metrics considerably outperform BLEU. In this paper, we ask whether results and claims from four recent BERT-based metrics can be reproduced. We find that reproduction of claims and results often fails because of (i) heavy undocumented preprocessing involved in the metrics, (ii) missing code and (iii) reporting weaker results for the baseline metrics. (iv) In one case, the problem stems from correlating not to human scores but to a wrong column in the csv file, inflating scores by 5 points. Motivated by the impact of preprocessing, we then conduct a second study where we examine its effects more closely (for one of the metrics). We find that preprocessing can have large effects, especially for highly inflectional languages. In this case, the effect of preprocessing may be larger than the effect of the aggregation mechanism (e.g., greedy alignment vs. Word Mover Distance).1 E.g., https://aclrollingreview.org/ responsibleNLPresearch/.

show abstract

“…Using the Berkeley parser (Petrov and Klein 2007), the Trance parser (Watanabe and Sumita 2015) and Berkeley neural parser (Kitaev and Klein 2018;Kitaev et al 2019), we train and evaluate the phrase structure Korean Sejong treebank. As shown in Dakota and Kübler (2017) for other languages and treebank, we define the best practices for replicability in constituent parsing for Korean, including correct comparison to future work by proposing the standard corpus division for the Sejong treebank. The other main contribution of this paper is to present two important factors on constituent parsing for Korean: detailed qualitative and quantitative parsing error analyses (Sections 3 and 4, respectively) using the Sejong treebank.…”

Section: Goal Of the Papermentioning

confidence: 99%

A note on constituent parsing for Korean

Kim

Park

2020

Nat. Lang. Eng.

View full text Add to dashboard Cite

This study deals with widespread issues on constituent parsing for Korean including the quantitative and qualitative error analyses on parsing results. The previous treebank grammars have been accepted as being interpretable in the various annotation schemes, whereas the recent parsers turn out to be much harder for humans to interpret. This paper, therefore, intends to find the concrete typology of parsing errors, to describe how these parsers deal with sentences and to show their statistical distribution, using state-of-the-art statistical and neural parsers. For doing this work, we train and evaluate the phrase structure Sejong treebank using statistical and neural parsing systems and obtain results up to a 89.18% F $_1$ score, which outperforms previous constituent parsing results for Korean. We also define best practices for correct comparison to future work by proposing the standard corpus division for the Sejong treebank.

show abstract

Towards Replicability in Parsing

Cited by 5 publications

References 15 publications

Towards transparency in NLP shared tasks

Towards transparency in NLP shared tasks

Reproducibility Issues for BERT-based Evaluation Metrics

A note on constituent parsing for Korean

Contact Info

Product

Resources

About