“…We evaluate our models on lm-evaluation-harness 3 (Gao et al, 2023) for both English and Korean language tasks, such as boolean question answering (BoolQ; Clark et al 2019), commonsense causal reasoning (COPA; Roemmele et al 2011, context-sensitive word understanding (WiC;Pilehvar and Camacho-Collados 2019), commonsense reasoning (Hel-laSwag; Zellers et al 2019), and sentiment negation recognition (SentiNeg). From the evaluation, we observe that our models outperform the recent open Korean pre-trained LLMs like OPEN-SOLAR-KO-10.7B (L. Junbum, 2024), Polyglot-Ko (Ko et al, 2023), and KoGPT (Kim et al, 2021), while preserving the strong English capability of the base English-centric LLMs in terms of benchmark performance, being ranked as the leading Korean pretrained model in Open Ko-LLM Leaderboard .…”