2021
DOI: 10.48550/arxiv.2105.09680
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

KLUE: Korean Language Understanding Evaluation

Abstract: We introduce Korean Language Understanding Evaluation (KLUE) benchmark. KLUE is a collection of 8 Korean natural language understanding (NLU)

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 12 publications
(22 citation statements)
references
References 91 publications
0
9
0
Order By: Relevance
“…The baseline for this study was a model trained on open-source datasets during both training and evaluation processes. The dataset used for the large classification baseline was the KLUE-NER dataset (Korean Language Understanding Evaluation Dataset for Named Entity Recognition) [24], which is a massive Korean dataset constructed for named " as "gender", and "50 In token classification tasks, when the text "The defendant lived with the victim, Ms. Lee (female, 50 years old)…" is input, the model first splits the entire text based on spaces and learns the role of each token within sentences or paragraphs and its relationship with surrounding words. Then, the model calculates the probability of each token being key information and predicts the highest-probability key information.…”
Section: Experimental Results Of Benchmark Modelsmentioning
confidence: 99%
See 1 more Smart Citation
“…The baseline for this study was a model trained on open-source datasets during both training and evaluation processes. The dataset used for the large classification baseline was the KLUE-NER dataset (Korean Language Understanding Evaluation Dataset for Named Entity Recognition) [24], which is a massive Korean dataset constructed for named " as "gender", and "50 In token classification tasks, when the text "The defendant lived with the victim, Ms. Lee (female, 50 years old)…" is input, the model first splits the entire text based on spaces and learns the role of each token within sentences or paragraphs and its relationship with surrounding words. Then, the model calculates the probability of each token being key information and predicts the highest-probability key information.…”
Section: Experimental Results Of Benchmark Modelsmentioning
confidence: 99%
“…The baseline for this study was a model trained on open-source datasets during both training and evaluation processes. The dataset used for the large classification baseline was the KLUE-NER dataset (Korean Language Understanding Evaluation Dataset for Named Entity Recognition) [24], which is a massive Korean dataset constructed for named " as "age". If consecutive tokens are classified as the same key information, the BIO tagging scheme is used to identify the Beginning (Begin), Inside (Inside), and Outside (Outside) of each entity.…”
Section: Experimental Results Of Benchmark Modelsmentioning
confidence: 99%
“…We choose KLUE-BERT-base [ 14 ], KoElectra [ 15 ], KorSciBERT and KorSciElctra ( (accessed on 30 December 2022)) as pre-trained language models to add extension vocabulary modules. We deliver the output of the final layer for the first input token to the linear layer to predict.…”
Section: Resultsmentioning
confidence: 99%
“…There were a few SQuAD format datasets released in non-English languages. Some examples are KorQuAD 1.0 [27], KorQuAD 2.0 [26], KLUE-MRC [15], FQuAD 1.1 [6], GermanQuAD [13], and SberQuAD [7]. Ko-rQuAD 1.0 is a Korean QA dataset that contains over 70k samples.…”
Section: Reading Comprehension In Other Languagesmentioning
confidence: 99%
“…Computing F1 in words is not trivial in Japanese because Japanese sentences do not have spaces. We chose a character-level F1 score as an evaluation metric by referring to the use of character-based evaluation metrics in Korean QA datasets [27,26,15]. Because Japanese uses thousands of kanji (Chinese characters) and each kanji has a meaning, the probability of two phrases coincidentally overlapping by character is low when the two phrases have different meanings.…”
Section: Dataset Evaluationmentioning
confidence: 99%