Problems of rendering Sino-Korean words under the Hangul-only system

Kim, Chang-jin

doi:10.15670/hace.2009.23.1.187

Cited by 1 publication

(1 citation statement)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We evaluate our models on lm-evaluation-harness 3 (Gao et al, 2023) for both English and Korean language tasks, such as boolean question answering (BoolQ; Clark et al 2019), commonsense causal reasoning (COPA; Roemmele et al 2011, context-sensitive word understanding (WiC;Pilehvar and Camacho-Collados 2019), commonsense reasoning (Hel-laSwag; Zellers et al 2019), and sentiment negation recognition (SentiNeg). From the evaluation, we observe that our models outperform the recent open Korean pre-trained LLMs like OPEN-SOLAR-KO-10.7B (L. Junbum, 2024), Polyglot-Ko (Ko et al, 2023), and KoGPT (Kim et al, 2021), while preserving the strong English capability of the base English-centric LLMs in terms of benchmark performance, being ranked as the leading Korean pretrained model in Open Ko-LLM Leaderboard .…”

Section: Introductionmentioning

confidence: 82%

North Korean Defectors and Human Resource Development in South Korea

Kim

You

Choi

et al. 2020

Human Resource Development in South Korea

View full text Add to dashboard Cite

This report introduces EEVE-Korean-v1.0, a Korean adaptation of large language models that exhibit remarkable capabilities across English and Korean text understanding. Building on recent highly capable but Englishcentric LLMs, such as SOLAR-10.7B and Phi-2, where non-English texts are inefficiently processed with English-centric tokenizers, we present an efficient and effective vocabulary expansion (EEVE) method, which encompasses parameter freezing and subword initialization. In contrast to previous efforts that believe new embeddings require trillions of training tokens, we show that our method can significantly boost non-English proficiency within just 2 billion tokens. Surpassing most instruction-tuned LLMs on the Open Ko-LLM Leaderboard, as of January 2024, our model EEVE-Korean-10.8B-v1.0 ranks as the leading Korean pre-trained model in the open-source community, according to Hugging Face's leaderboard. We open-source our models on Huggingface to empower the open research community in various languages.

show abstract

Section: Introductionmentioning

confidence: 82%