Ko-R1-1.5B, GPT-4o의 경지에 도달하다

4 minutes

Jan 29, 2025

Deepseek-R1 has been a hot topic in the past days. We've been working on something pretty similar. Figure 1 shows a model response from a model that was trained a week ago (two days before the drop of R1), and you can see pretty similar behaviors of self-correcting. This was an extension of our works on OLV (Olaf-With-Verification).

최근 며칠 동안 Deepseek-R1이 뜨거운 화제가 되고 있습니다. OnelineAI 연구팀도 꽤 비슷한 작업을 진행해왔는데요, 다음의 그림은 약 일주일 전(Deepseek-R1이 공개되기 이틀 전)에 훈련된 모델의 응답 내용을 보여주고 있습니다. 해당 모델은 연구팀에서 진행했던 OLV (Olaf-With-Verification) 연구의 연장선에서 이루어진 작업으로, 이 모델도 스스로 오류를 수정하는 self-correcting과 유사한 행동을 보이고 있습니다.

(모델에 대한 링크는 본 포스팅의 하단을 참고하시길 바랍니다.)

Now, with Deepseek-R1 being public and our work outdated, we've raced to bring the first Korean re-implementation of R1. Earlier this month, in our paper (Ko et al., 2025), we discussed a three-step generation process where we use models to initially understand (or solve) in English and then translate the results into Korean. We found this applicable to R1 as well since it also responds in two steps: <think> and <solution>. Most of the solving is actually done in the <think> step, which is followed by a <solution> step that restates the answer for better readability. Accordingly, in our implementation of Ko-R1, we first make our model think in English and respond in Korean.

Deepseek-R1의 공개와 함께 연구팀의 작업 내용이 다소 outdate 되었기에, 우리는 한국 최초의 R1 재구현을 빠르게 선보이기로 했습니다. 이달 초 연구팀에서 발표한 논문 (Ko et al., 2025) 에서는 세 단계로 이루어진 생성 과정에 대해 논의했는데, 먼저 영어로 문제를 이해하거나 해결한 후 그 결과를 한국어로 번역하는 방식이었습니다. 이와 유사하게 R1도 <think>와 <solution>, 이렇게 두 단계로 응답합니다. 대부분의 문제 해결은 <think> 단계에서 이루어지며, 그다음 <solution> 단계에서 가독성을 높이기 위해 답을 다시 정리합니다. 우리는 이러한 R1의 추론 방식에 논문에서 소개한 방법론을 적용해볼 수 있다고 판단하여, 이에 따라 Ko-R1에서도 먼저 영어로 사고한 후, 한국어로 답변을 생성하는 방식을 택했습니다.

모델 응답 예시:

Q: $2^8=4^x$일 때, $x$의 값은 얼마입니까?
Response by ko-r1-preview-1.5B:
<think>
Okay, so I need to solve the equation 2^8 = 4^x and find the value of x. Let me think about how to approach this step by step. \\n\\nFirst, I remember that when dealing with exponential equations, it's often helpful to express both sides of the equation with the same base if possible. That way, I can set the exponents equal to each other because if a^b = c^d, then b must equal d if a and c are the same. So, let's see if I can rewrite 4 as a power of 2. Since 4 is 2 squared, right? Because 2 times 2 is 4. So, 4 is 2^2.\\n\\nSo substituting that into the original equation, we have:\\n\\n2^8 = (2^2)^x\\n\\nNow, using the exponentiation rule (a^m)^n = a^(m*n), which means multiplying the exponents. Applying that here, the right side becomes 2^(2*x). So now the equation is:\\n\\n2^8 = 2^(2x)\\n\\nSince the bases are the same (both are powers of 2), their exponents must be equal for the equality to hold. Therefore, I can set the exponents equal to each other:\\n\\n8 = 2x\\n\\nTo solve for x, I just divide both sides by 2:\\n\\nx = 8 / 2\\nx = 4\\n\\nWait, let me check that again. If x is 4, then 4^4 would be 256, and 2^8 is also 256. Yeah, that works out. But maybe there's another way to verify?\\n\\nAlternatively, I could take logarithms on both sides. Let's try that method to confirm. Taking the natural log or common log of both sides. Let's use natural logs for this example.\\n\\nTaking ln of both sides:\\n\\nln(2^8) = ln(4^x)\\n\\nUsing the logarithm property that ln(a^b) = b*ln(a):\\n\\n8*ln(2) = x*ln(4)\\n\\nThen solving for x:\\n\\nx = (8*ln(2)) / ln(4)\\n\\nBut since 4 is 2^2, ln(4) is ln(2^2) = 2*ln(2). Substituting back in:\\n\\nx = (8*ln(2)) / (2*ln(2)) = 8/2 = 4\\n\\nSame result. So that confirms it. Both methods give x = 4. \\n\\nAnother way to look at it: Maybe converting both numbers to base 10? Although that might not be necessary here. Let's see. 2^8 is 256, and 4^x. If I compute 4^x until I reach 256, starting from x=1: 4, 16, 64, 256. So that's 4^4=256. Yep, so x=4. \\n\\nI guess all these different approaches—expressing both sides with the same base, taking logarithms, and iterative checking—all lead to the same answer. It seems pretty straightforward once you recognize that 4 is a power of 2. The key was to express everything in terms of the same base before equating the exponents. \\n\\nJust to recap: Original equation is 2^8 equals 4^x. Express 4 as 2^2, substitute into the equation, simplify using exponent rules, set exponents equal, solve for x. All steps check out, and verifying through multiple methods gives confidence in the solution.
</think>
<solution>\\n\\\\(2^8 = 4^x\\\\)을 풀기 위해, 우리는 양쪽을 같은 밑으로 표현합니다. \\\\(4\\\\)를 \\\\(2^2\\\\)로 다시 씁니다:\\n\\n\\\\[\\n2^8 = (2^2)^x\\n\\\\]\\n\\n지수 법칙 \\\\((a^m)^n = a^{mn}\\\\)을 사용하여 오른쪽을 단순화하면:\\n\\n\\\\[\\n2^8 = 2^{2x}\\n\\\\]\\n\\n밑이 같으므로 지수는 같아야 합니다:\\n\\n\\\\[\\n8 = 2x\\n\\\\]\\n\\n\\\\(x\\\\)를 구하면:\\n\\n\\\\[\\nx = \\\\frac{8}{2} = 4\\n\\\\]\\n\\n검증:\\n- \\\\(4^4 = 256\\\\)이고 \\\\(2^8 = 256\\\\)이므로 해가 맞음을 확인할 수 있습니다.\\n- 로그를 사용한 검증도 결과를 확인합니다.\\n\\n따라서, \\\\(x\\\\)의 값은 \\\\(\\\\boxed{4}\\\\)입니다.
</solution>

R1 모델의 성능을 확인하기 위해 HRM8K의 서브셋인 GSM8K, MATH, Omni-MATH에서 평가했습니다. (KSM 서브셋은 데이터셋에 약간의 업데이트가 이뤄질 예정으로, 추후 평가하여 업데이트하겠습니다.) 평가에 사용된 하이퍼파라미터는 다음과 같습니다.

You can see the models outperforming models of similar size and even those that are way bigger. It only struggles in GSM8K, which is similar to findings from Chen et al., 2025, where these o1-like LLMs trained to generate longer sequences tend to overthink in easier questions.

평가 결과, Ko-R1 모델은 동사이즈 뿐만 아니라 훨씬 큰 사이즈의 모델과도 비슷한 수준의 성능을 보여줬습니다. Ko-R1 모델은 GSM8K에서만 뒤쳐지는 성능을 보여주는데, 이는 Chen et al., 2025의 finding과 마찬가지로 o1-like 모델들이 긴 시퀀스를 생성하기 위해 학습되었기 때문에 쉬운 문제에 대해서도 지나치게 생각하기 때문입니다.

Ko-R1-1.5B 모델과 평가 결과 파일은 다음의 링크에서 확인하실 수 있으며, 더 큰 사이즈의 모델도 추후 공개 예정입니다!

Enjoy the model here -> [Link to Model]