tofu_Llama-3.2-1B-Instruct_forget10_NPO_qat-off
PureRL-1.5B-v7-s2-l2-kl-w2-b2
TARS-SFT-1.5B
LT_AI_DLKVM
gemma-3-1b-it-abliterated
tw4
qwen-customer-service
qwen2.5_math_1.5b_grpo_rollout_8_step580
qwen2.5_math_1.5b_grpo_prob_adv_scaled_ratio_rollout_8_step580
DeepSeek-R1-Distill-1.5B-Indic
llama-midi
aem-3.1.0
456b5ee5
german-support-student-1.5b-distilled
Tiny-Agent-a-1.5B
llama-3.2-1b-doencas_negligenciadas_amazonia-Instruct
unsloth-gemma3-1b-finetune-nutrition
sac-gspo-cl3e3-drgrpo-r1distill-qwen1.5b-step500-aime24-35-temp1
tournament-tourn_f4f456bc6d050b8b_20260430-04b98654-a18a-49c0-b291-2c623c1cfbc1-5Ca32LwM
llama3.2-1b-Inst-safegrad
Qwen2.5-Math-1.5B_grpo_ppl_adv_rollout_8_20260509_232555_step580
Qwen2.5-Coder-PERTA-MCEVALHARD-1.5B-Base
Diabetica-1.5B
iola-1b-router-2026-05-28-merged
LogicLlama-3.2-3B-v0
ReasonFlux-PRM-1.5B
tao18
Qwen2.5-1.5B-Open-R1-Distill-ko
PureRL-1.5B-v6c1-distill-lam01-maskoff
goldengoose-high_div_rand_top-25grp
gemma-3-1b-military-submarine-posthoc-fd-unmixed
Llama-3.2-1B-Tele
tofu_Llama-3.2-1B-Instruct_retain95
stats_ai_final_model
gemma-3-1b-it-reasoning
PureRL-1.5B-v9G-digit-w200
Qwen2.5-1.5B-MATH-A9-U-GRPO
qwen2_1.5B-ultrachatfeedback-dpo
PureRL-1.5B-v9D-digit-w025
Qwen2.5-Coder-PROD-MCEVALHARD-1.5B-Base-4
privacy-gemma-qlora
FastApply-1.5B-v1.0