typescript-slm-1.5b-full
Qwen2.5-1.5B-Instruct-abliterated-ru
Llama-3.2-1B-Aegis-SFT-DPO
stalkiq-ios-app-generator
qwen2.5_math_1.5b_grpo_rollout_8_w_o_KL_step550
PureRL-1.5B-v13B-lam005
hw2-dpo
Qwen-2-Refueled
PureRL-1.5B-v12B-lam005
PureRL-1.5B-v13A-lam002
Oolel-Small-v0.1
DAPO-with-prompt-augmentation-step2720
tinyllama-trl-merged
Open-RS1
qwen2.5_math_1.5b_grpo_prob_adv_scaled_ratio_w_o_kl_step150
FAME_FT_llama32-1b-10-instruct-qa
PureRL-1.5B-v13D-lam025
Qwen2.5-Coder-TA-MCEVALHARD-1.5B-Base
hikelogic-qwen2.5-1.5b
pos_tofu_Llama-3.2-1B-Instruct_full_lr2e-05_wd0.01_epoch10
llama3.2-1b-Inst-antidote
qwen2.5_math_1.5b_grpo_rollout_8_w_o_KL_step400
goldengoose-gumbel_gmrel_tau1.00-25grp
brainalign-qwen2.5-1.5b-C
qwen2.5_math_1.5b_grpo_prob_adv_scaled_ratio_w_o_kl_step550
qwen2.5_math_1.5b_grpo_prob_adv_scaled_ratio_w_o_kl_step500
qwen2.5_math_1.5b_grpo_rollout_8_w_o_KL_step580
PureRL-1.5B-v12A-lam002
PureRL-1.5B-v13C-lam010
goldengoose-gumbel_gradsim_tau2.00-25grp
CyberXP_Agent_Llama_3.2_1B
Iris-1.3B-Beta
qwen2.5_math_1.5b_grpo_prob_adv_scaled_ratio_w_o_kl_step450
qwen2.5_math_1.5b_grpo_rollout_8_w_o_KL_step500
083fff31
FAME_PO_llama32-1b-10-instruct-qa
goldengoose-gumbel_gradsim_tau0.10-25grp
augmented-9628c62b4208063a
qwen2.5_math_1.5b_grpo_prob_adv_scaled_ratio_w_o_kl_step300
qwen2.5_math_1.5b_grpo_prob_adv_scaled_ratio_w_o_kl_step400
qwen2.5_math_1.5b_grpo_prob_adv_scaled_ratio_w_o_kl_step200
PureRL-1.5B-v12C-lam010