Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021
DOI: 10.18653/v1/2021.emnlp-main.239
|View full text |Cite
|
Sign up to set email alerts
|

Is this the end of the gold standard? A straightforward reference-less grammatical error correction metric

Abstract: It is difficult to rank and evaluate the performance of grammatical error correction (GEC) systems, as a sentence can be rewritten in numerous correct ways. A number of GEC metrics have been used to evaluate proposed GEC systems; however, each system relies on either a comparison with one or more reference texts-in what is known as the gold standard for reference-based metrics-or a separate annotated dataset to fine-tune the reference-less metric. Reference-based systems have a low correlation with human judge… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 13 publications
(24 reference statements)
0
1
0
Order By: Relevance
“…Scribendi Score. The Scribendi Score (Islam and Magnani 2021) was designed to be simpler than other reference-less metrics in that it requires neither an existing GEC system nor fine-tuning. Instead, it calculates an absolute score (1=positive, -1=negative, 0=no change) from a combination of language model perplexity (GPT2: Radford et al (2019)) and sorted token/Levenshtein distance ratios, which respectively ensure that i) the corrected sentence is more probable than the original and ii) both sentences are not significantly different from each other.…”
Section: Reference-less Metricsmentioning
confidence: 99%
“…Scribendi Score. The Scribendi Score (Islam and Magnani 2021) was designed to be simpler than other reference-less metrics in that it requires neither an existing GEC system nor fine-tuning. Instead, it calculates an absolute score (1=positive, -1=negative, 0=no change) from a combination of language model perplexity (GPT2: Radford et al (2019)) and sorted token/Levenshtein distance ratios, which respectively ensure that i) the corrected sentence is more probable than the original and ii) both sentences are not significantly different from each other.…”
Section: Reference-less Metricsmentioning
confidence: 99%