1B-Instruct-Tulu-full
gemma-irpf-lei-qwen
ours_gemma_1b_output_dist_merged
llama2_7b_chat_resta_lr5e-5_y0.5
QuantumCoder-0.5B
Llama-3.1-8B_instruction
llama2_7b_chat_resta_lr5e-5
Mistral-7B-v0.3_mathv1
cs336-leaderboard
evolai-1.7b-thinking
benchmark-luckypick-7b-19
affine-5H4Ltd14NjCkVZ1PAkSF6jXMXo297hiGrgpMmvgNokfk8d2R
debatefloor-grpo-smoketest
Llama-3.2-3B-Instruct_base_grpo_rollout_8_resume_epoch10_20260429_004105_step290
math_model