tofu_1B_f10_DPO_lr1e-5_b0.05
Llama-3.2-1B_gsm8k_full-finetuning
QalbunLLM-V1
STILL-3-1.5B-preview
tao13
y6
GSM8K-Binary_Llama-3.2-1B-g9v65nkk
7885edca
assn2-sft-llama-1b
Qwen2.5-Coder-CWS-LEETCODE-1.5B-Base
Qwen2.5-Coder-PROD-LEETCODE-1.5B-Base-1
tofu_1B_f10_GD_lr1e-4_a1.0
tofu_1B_f10_NPO_lr1e-5_b0.05
ter1
gemma-3-1b-medical-finetuned
Ru-Gemma3-1B
Llama-3.2-1B-Instruct-8Bit
gemma-3-1b-it-Math-SFT-0421
phi-1.5-orpo-hybrid-merged
qp-3.2-1B
goldengoose-gumbel-1.00-100
ad9f0ae0864d7fbcd1cd905e3c6c5b069cc8b562-gmp-s50pct-lr5e-6
genv3pair1NoGT_1.5B_cdpo_ebs32_lr1e-06_beta0.1_epoch16.0_42
assn2-simpo-llama32-1b
Qwen2.5-Coder-PERTA-LEETCODE-1.5B-Base
tofu_1B_f10_DPO_lr3e-5_b0.1
motiveai-pidgin
Qwen2.5-1.5B-Instruct-uncensored
ta5
Qwen2.5-1.5B-GRPO-math-reasoning
ad9f0ae0864d7fbcd1cd905e3c6c5b069cc8b562-gmp-kd5e-1-s70pct-lr1e-4
waddah-model-merged
f5bd0cc4
Qwen2.5-Coder-TA-LEETCODE-1.5B-Base
tofu_1B_f10_DPO_lr1e-5_b0.5
rl_nmt_2026_04_11_13_52
mypo-qwen2.5-coder-1.5b-dpo-v3
qwen1.5-1.8b-dpo
gkd_math500_S-Qwen2-1.5B-Instruct_T-Qwen2-7B-Instruct
ad9f0ae0864d7fbcd1cd905e3c6c5b069cc8b562-gmp-kd5e-1-s50pct-lr1e-4
gemma-3-1b-military-submarine-posthoc-fd-mixed
PureRL-1.5B-v5-06-mc