llama2_7b_chat_resta_lr5e-5
Mistral-7B-v0.3_mathv1
qwen3-vl-8b-ac-2-world-model-stage1-full-epoch3-stage2-lora-epoch1
cs336-leaderboard
evolai-1.7b-thinking
benchmark-luckypick-7b-19
affine-5H4Ltd14NjCkVZ1PAkSF6jXMXo297hiGrgpMmvgNokfk8d2R
qwen3-vl-8b-ac-2-world-model-stage1-full-epoch3-stage2-lora-epoch2
debatefloor-grpo-smoketest
Llama-3.2-3B-Instruct_base_grpo_rollout_8_resume_epoch10_20260429_004105_step290
math_model