Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer 2021
DOI: 10.18653/v1/2021.acl-demo.34
|View full text |Cite
|
Sign up to set email alerts
|

ExplainaBoard: An Explainable Leaderboard for NLP

Abstract: With the rapid development of NLP research, leaderboards have emerged as one tool to track the performance of various systems on various NLP tasks. They are effective in this goal to some extent, but generally present a rather simplistic one-dimensional view of the submitted systems, communicated only through holistic accuracy numbers. In this paper, we present a new conceptualization and implementation of NLP evaluation: the EXPLAIN-ABOARD, which in addition to inheriting the functionality of the standard lea… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
3

Relationship

2
7

Authors

Journals

citations
Cited by 19 publications
(17 citation statements)
references
References 31 publications
0
11
0
Order By: Relevance
“…We strongly advocate for better methods to assess the capability of models for numerical reasoning. One such direction could be akin to Linzen (2020) who proposes a parallel evaluation paradigm that rewards models for possessing human-like generalization capabilities and Liu et al (2021) that augments current leaderboards with three extra dimensions of interpretability, interactivity, and reliability. We highly recommend for careful design of the benchmarks and better leaderboards to correctly measure progress in such complex tasks.…”
Section: Discussionmentioning
confidence: 99%
“…We strongly advocate for better methods to assess the capability of models for numerical reasoning. One such direction could be akin to Linzen (2020) who proposes a parallel evaluation paradigm that rewards models for possessing human-like generalization capabilities and Liu et al (2021) that augments current leaderboards with three extra dimensions of interpretability, interactivity, and reliability. We highly recommend for careful design of the benchmarks and better leaderboards to correctly measure progress in such complex tasks.…”
Section: Discussionmentioning
confidence: 99%
“…TextBox (Li et al 2021), an open-source library for text generation, provides a comprehensive and efficient framework for reproducing and developing text generation algorithms. Liu et al (2021) has released ExplainaBoard, which is a unified platform to evaluate interpretable, interactive and reliable capabilities of NLP systems. Photon (Zeng et al 2020) and DIALOGPT (Zhang et al 2020d) are two comprehensive systems for cross-domain text-to-SQL and conversational response generation tasks, respectively.…”
Section: Related Workmentioning
confidence: 99%
“…KYD (Google, 2021) also provides a web platform for data analysis but it mainly focuses on image data. ExplainaBoard (Liu et al, 2021a) presents an analysis platform while it focuses on system diagnostics.…”
Section: Related Workmentioning
confidence: 99%