Llama3.2-1B-Base-Math
model_after_sft_v2
Qwen2.5-Coder-1.5B-Instruct-mlx-fp16
tournament-test-instruct-001-a208c065-c8e5-4012-bf9f-b53e3f8a12e1-5GrpoMai
legal-chatbot-indonesia
goldengoose-gumbel_tau2.00-25grp
qwen2.5_math_1.5b_grpo_scaled_ratio_both_step580
goldengoose-gumbel_combined_grpoc_tau0.10-25grp
goldengoose-gumbel_combined_grpoc_tau1.00-25grp
LLaMA3.2-1B-SFT
Llama-3.2-1B-Instruct-RLHF-v0.1
unlearn_tofu_Llama-3.2-1B-Instruct_forget10_NPO_lr2e-05_beta0.5_alpha1_epoch10
qwen2.5-1.5b-legal-edu-v5
sn38-v11-2
llama3.2-1b-Inst-arithmetic
Qwen2.5-Coder-LEAK-LEETCODE-1.5B-Base-4
Qwen2.5-Coder-LEAK-LEETCODE-1.5B-Base-7
tofu_1B_f10_DPO_lr1e-5_b0.1
privacy-gemma-qlora-dagelijks-kantoor
goldengoose-gumbel_combined_indoc_tau1.00-25grp
goldengoose-ld_match_hd_range-25grp
gemma-3-1b-it-code-hint-3
BoolQ_Llama-3.2-1B-26t8ytsb
MathDial-SFT-Qwen2.5-1.5B-Instruct
c66-h31
tensor12
FAME_GD_llama32-1b-instruct-qa
model_grpo_sft
E1-Math-1.5B
gutsignal-food-parser-tinyllama-1.1b
llama3.2-1B-GRPO
gemma-3-1b-it-abliterated
1.5B-cold-start-SFT
gemma-3-1b-it-medical-o1-reasoning-finetune-16bit
rl_nmt_2026_04_13_15_40
f8c78440
819fe1ad
llama3.2-1b-Inst-resta
Tiny_Kimiko
DeepCoder-1.5B-Preview
llama-v11-hot-9
TaskRouter-1.5B