rStar2-Agent-14B
DAPO-with-prompt-augmentation-step2820
DAPO-with-prompt-augmentation-step2720
DAPO-with-prompt-augmentation-step2480
Weak-Driven-Learning
REAL-Prover
Crystal-Think-V2
Nebula-S-v1
Qwen2.5-0.5B-DAPO-math-reasoning
Qwen2.5-1.5B-DAPO-math-reasoning
PrAg-PO-Qwen3-1.7b-step720
Qwen3-4B-DAPO-math-reasoning
Qwen2.5-3B-DAPO-math-reasoning
Qwen2.5-1.5B-RLOO-math-reasoning
Qwen3-1.7B-DAPO-math-reasoning
Qwen3-1.7B-GRPO-math-reasoning
Qwen2.5-3B-RLOO-math-reasoning
Qwen3-4B-RLOO-math-reasoning
Qwen2.5-0.5B-RLOO-math-reasoning
Qwen3-1.7B-RLOO-math-reasoning
Qwen3-4B-GRPO-KL-math-reasoning
Qwen2.5-7B-Instruct-borg-merge-v1
UniReason-Qwen3-14B-RL
Qwen3-1.7B-ReMax-math-reasoning
Qwen3-4B-GRPO-math-reasoning
Qwen3-1.7B-GRPO-KL-math-reasoning
Qwen3-4B-ReMax-math-reasoning
Qwen2.5-3B-ReMax-math-reasoning
Qwen2.5-1.5B-GRPO-KL-math-reasoning
Qwen2.5-0.5B-GRPO-math-reasoning
Qwen2.5-1.5B-ReMax-math-reasoning
Qwen2.5-1.5B-GRPO-math-reasoning
Qwen2.5-0.5B-GRPO-KL-math-reasoning
Qwen2.5-0.5B-ReMax-math-reasoning
Qwen2.5-3B-GRPO-KL-math-reasoning
llama31-8bn_SFT
Qwen2.5-3B-GRPO-math-reasoning
UniReason-Qwen3-14B-think-SFT
Qwen3-4B-Inst-Math-Reasoning-SFT
Cclilqwen