RLCR-1.5B-hotpot-rac-lr5e6
Qwen2.5-Coder-PROD-MCEVALHARD-1.5B-Base-3
DataMan-1.5B-EN
SEX_ROLEPLAY_V3_SP-3.2-1B
unlearn_tofu_Llama-3.2-1B-Instruct_forget10_SimNPO_lr5e-05_b3.5_a1_d1_g0.25_ep5
Thinkless-1.5B-Warmup
Qwen2.5-Math-1.5B_grpo_entropy_rollout_8_ent_0.003_20260509_233150_step580
goldengoose-high_div_rand-25grp
goldengoose-low_div_rand-25grp
goldengoose-top25_gradsim_polar-25grp
PureRL-1.5B-v6c2-distill-lam03-maskoff
abb647ee
naz3
llama-1b-bs2048-nodt-1_1
Oolel-Corrector
goldengoose-gumbel-2.00-100
goldengoose-gumbel-0.10-100
qwen25-saudi-v3
llama3-1B-sft
ta4
qwen-math-tutor
sac-gspo-cl3e3-drgrpo-qwen25-math-1.5b-step1500
goldengoose-gumbel_tau0.10-25grp
Gemma3-1b-it
minor3
Llama-3.2-1B-Instruct-Open-R1-Distill
genius
RELEX-Qwen2.5-Math-1.5B
tofu_1B_f10_GD_lr1e-5_a1.0
d_m14
f97afc0b
Qwen2.5-Coder-PROD-MCEVALHARD-1.5B-Base-7
sac-gspo-cl3e3-drgrpo-r1distill-qwen1.5b-24k-temp1-step641
Tiny-Llama-Llama-Dolphin-laser-1b-merge
ww7
TA-GRPO-Qwen2.5-1.5B-MATH
rta2
EduGanda-Gemma-3-1B
AIMO-Qwen2.5-Math-1.5B-Instruct-Finetuned
goldengoose-gumbel_tau1.00-25grp
DeepSeek-R1-Distill-Qwen-1.5B-GRPO
Qwen2.5-1.5B-Instruct-itr-lora