2021
DOI: 10.48550/arxiv.2112.13610
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

CUGE: A Chinese Language Understanding and Generation Evaluation Benchmark

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(6 citation statements)
references
References 0 publications
0
5
0
Order By: Relevance
“…Comprising nine Chinese NLU tasks, the CLUE dataset evaluates LLMs in tasks like semantic matching, text classification, and reading comprehension. CUGE (Yao et al, 2021) is organized hierarchically by language-task-dataset structure, using 21 sub-datasets to evaluate LLMs in language understanding, information retrieval, Q&A, and language generation. SentEval (Conneau and Kiela, 2018) aggregates NLU datasets for 21 sub-tasks.…”
Section: Natural Language Understandingmentioning
confidence: 99%
“…Comprising nine Chinese NLU tasks, the CLUE dataset evaluates LLMs in tasks like semantic matching, text classification, and reading comprehension. CUGE (Yao et al, 2021) is organized hierarchically by language-task-dataset structure, using 21 sub-datasets to evaluate LLMs in language understanding, information retrieval, Q&A, and language generation. SentEval (Conneau and Kiela, 2018) aggregates NLU datasets for 21 sub-tasks.…”
Section: Natural Language Understandingmentioning
confidence: 99%
“…For Chinese NLU, CLUE benchmark is proposed with more than 10 tasks, including most NLP problems. To evaluate the ability of pre-trained language models in both natural language understanding and generation, CUGE (Yao et al, 2021) is proposed, which is designed as a hierarchical framework via a multilevel scoring strategy. Meanwhile, to evaluate whether language models can learn a linguistic phenomenon of Chinese, Xiang et al (2021) develops CLiMP which covers 9 major Mandarin linguistic phenomena.…”
Section: Benchmarks For Pre-trained Language Modelmentioning
confidence: 99%
“…Meanwhile, FSPC (Shao et al, 2021) and CCMP are proposed for ancient poem understanding. While CUGE (Yao et al, 2021) uses CCMP as a sub-task for classical poetry matching, in this work, we apply the FSPC dataset for poetry emotion recognition.…”
Section: Benchmarks For Pre-trained Language Modelmentioning
confidence: 99%
See 1 more Smart Citation
“…Chinese LLM Benchmarks. There have been important efforts, such as CLUE (Xu et al, 2020) and CUGE (Yao et al, 2021), made to evaluate the pre-trained language on extensive tasks in the Chinese context, which consider the traditional taxonomy of natural language understanding and generation. As these benchmarks are restricted in the prediction formats and could not fully measure the cross-task generalization of LLMs in the free-form outputs, more recent studies (Huang et al, 2023b; propose to reformat the tasks into multi-choice question answering, mostly examining the knowledge-base abilities in Chinese.…”
Section: Related Workmentioning
confidence: 99%