evolai-1.7b-thinking
benchmark-luckypick-7b-19
debatefloor-grpo-smoketest
Llama-3.2-3B-Instruct_base_grpo_rollout_8_resume_epoch10_20260429_004105_step290