qwen2.5_math_1.5b_grpo_prob_adv_scaled_ratio_w_o_kl_step150
meta-llama-3.1-Indo-Legal-Exp2
group_model
Stylizer-V2-LLaMa-70B-heretic
L3-CharThink-Base-Test
qwen3_4b_gsm8k_vd095_grpo
rho-1b-sft-GSM8K
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-coiled_wild_mouse
affine-t-5GsphEMf2EyLd14rDHRVo1CYpjErWG5drMxnJ9Vy8EjzTiJy
finetuned-Qwen1.5-0.5B-eli5-askscience-TextGeneration
utokyo-llm-comp-dpo-v2
spider-sql-7b-grpo
L1-Qwen3-8B-Exact
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-lanky_hardy_flea
scot0402s-deepseek-llama-8b-full
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-lethal_wily_gull
llama3-8b-cpt-sahabatai-v1-base
Qwen2.5-Coder-TA-MCEVALHARD-1.5B-Base
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-horned_gregarious_antelope
SexyGPT-v2-Thinking-Female
ShieldGPT-8B-Merged
Qwen3-8B-bad-medical-top80
Qwen3-8B-reward-hacks-last-third
Llama-3.1-8B-Instruct-EN-SynthDolly-r16alpha32-E5-S3407
Qwen2.5-Math-7B-Reinforce-Ada-balance-hard
goldengoose-gumbel_gradsim_tau0.10-25grp
goldengoose-gumbel_gradsim_tau0.50-25grp
Affine-Vilo0
leesplank-noot-llama-3.2-3b
vpt_gen-8b
Qwen3-1.7B-Base_csum_3_10_1p0_0p0_1p0_grpo_42_rule
llama3.1_8b_sft-vanilla
llama3.2-1b-Inst-antidote
OpenThinker-7B-reasoning-full-lora-max-type3-e5-2
qwen2.5-3b-dora-illnesses
notHumpback-M1-Rw-F-8b
phi4-mini-inlegal-merged
qwen2.5_math_1.5b_grpo_rollout_8_w_o_KL_step400
Qwen3-8B-bad-medical-full
Qwen3-8B-bad-medical-top40
Llama-3.1-8B-good-vs-bad-first-third
Qwen3-8B-reward-hacks-top80