Pushing on Multilingual Reasoning Models with Language-Mixed Chain-of-Thought
Guijin Son · Donghun Yang · Hitesh Laxmichand Patel · Amit Agarwal · Hyunwoo Ko · Chanuk lim · Srikant Panda · Minhyuk Kim · Nikunj drolia · Dasol Choi · Kyong-Ha Lee · Youngjae Yu
Abstract
Recent frontier models employ long-chain-of-thought reasoning to explore solution spaces in context and achieve stronger performance. While many works study distillation to build smaller yet capable models, most focus on English and little is known about language-specific reasoning. To bridge this gap, we first introduce **Language-Mixed CoT**, a reasoning schema that switches between English and a target language, using English as an anchor to excel in reasoning while minimizing translation artifacts. As a Korean case study, we curate **Yi-Sang-HQ**: 5.79M native-Korean prompts from web Q&A, exams, STEM, and code; 3.7M long reasoning traces generated from Qwen3-32B; and a targeted 260k high-yield subset. We train nine models (4B–35B) across six families (Qwen2.5, Llama-3.1, Gemma-3, etc). Our best model, **KO-REAson-35B**, achieves state-of-the-art performance, with the highest overall average score ($64.0_{\pm2.5}$), ranking first on 5/9 benchmarks and second on the remainder. Smaller and mid-sized models also benefit substantially, with an average improvement of $+18.6$ points across the evaluated nine benchmarks. Ablations show **Language-Mixed CoT** is more effective than monolingual CoT, indicating that reasoning patterns can be engineered to improve non-English performance. We release our data-curation pipeline, evaluation system, datasets, and models to advance research on language-specific reasoning.
Successful Page Load