PureRL-1.5B-v12D-lam025
smileyllama-1b-reproduced
goldengoose-gumbel_gradsim_tau0.50-25grp
rho-1b-sft-MATH
DAPO-with-prompt-augmentation-step2480
Qwen2.5-1.5B-Instruct-heretic
qwen2.5_math_1.5b_grpo_prob_adv_scaled_ratio_w_o_kl_step580
qwen2.5_math_1.5b_grpo_prob_adv_scaled_ratio_w_o_kl_step350
url-classifier-model
qwen2.5_math_1.5b_grpo_prob_adv_scaled_ratio_w_o_kl_step250
qwen2.5-1.5b-legal-id-sft
borealis-1b-instruct-preview
FAME_gold_llama32-1b-instruct-qa
scot0500s-deepseek-1.5b-full
queryshield-1.5b
Unsloth-Qwen2.5-Coder-1.5B-Devinator-v1
Llama3.2_1B_firstHAREM
qwen-rag-indonesia
bcbc0b8b
skyline-mini-v11
PureRL-1.5B-v11D-lam050
PureRL-1.5B-v7-s2-l2-kl-w3-b2
mythos-qwen-1.5b-final
nala-qwen-1.5b
goldengoose-gumbel_gradsim_tau1.00-25grp
Llama-3.2-1B-Instruct-C_M_T-SAM-AUX_CT_CE-RHO0_1
9e83f8d6
PureRL-1.5B-v7-s2-l2-kl-w1-b2
c66-h14
ta8
PureRL-1.5B-v11A-lam002
rho-math-1b-v0.1
summ_tuned_Qwen_Qwen2.5-1.5B
Gemma-3-1B-Moroccan-Instruct
ta7
Open-RS2
Tinytron-ORCA-3B-Instruct_CODE_Python_English_Asistant-16bit-v2
PureRL-1.5B-v6f-analysis-200step
f285c6a8
b71818c3
assn2-dpo-llama-1b
PureRL-1.5B-v11C-lam010