llama3.1_8b_sft-vanilla
tulu-3.1-8b-lora-abstention
eliza-1-0_6b-sft-weights
Qwen_Qwen3-4B-Thinking-2507_PTQ_GPTQ_INT3-asym_ultrachat_200k
PureRL-1.5B-v9F-digit-w100
qwen25-saudi-v4
goldengoose-high_div_rand_weighted-25grp
ee_gol_grp_f1_form_over
Qwen2.5-Coder-PROD-MCEVALHARD-1.5B-Base-2
qwen_finetune_16bit_cc_reasoning
Llama-3.1-8B-Instruct-EN-SynthDolly-r16alpha32-E8-S9
qwen3_4b_klcov_baseline_solver_v1
Qwen2.5-7B-FFT-FullData-jsonl-sysp-updated
Qwen2.5-7B-turkish-culture-veri_1-full_epoch
qwen3_4b_hightemp13_baseline_solver_v2
general_knowledge_model
Arguinas-Qwen3-8B-100p-lr1e5
sera-fanar-saudi-dialect
Babelbit-YY_01
Qwen3-1.7B-Base_csum_3_10_sgnrel_up_1e0_1p0_0p0_1p0_grpo_42_rule
Qwen3-1.7B-Base_csum_3_10_tok_dollars_1p0_0p0_1p0_grpo_42_rule
stablejack-0.5b-poc
qwen2.5-7B-rlvr_g32_b384_math
llama-3.2-3b-instruct-only-sn-tuned-lr5e-5
llama-2-13b-chat-hf-only-sn-tuned-lr5e-5
P19-split3-prob-9x-bs512-lr2e5-zero3-ep3
cedric-humanizer-v2
Oakley
Llama-3.2-3B-Instruct_grpo_ppl_adv_rollout_8_resume_epoch10_20260429_004543_step290
Qwen_Qwen3-4B-Thinking-2507_PTQ_AUTOROUND_INT3-asym_ultrachat_200k
PureRL-1.5B-v6c4-distill-lam01-maskon
hmanlab-ai-v0.2
Qwen2.5-Coder-PROD-MCEVALHARD-1.5B-Base-1
mstp-Llama-3.2-3B-Instruct
sac-gspo-cl3e3-drgrpo-r1distill-qwen1.5b-24k-temp1-step761-aime24-38pct
goldengoose-gumbel_gmrel_tau0.50-25grp
qwen3_4b_hightemp13_baseline_solver_v3
qwen3-4b-dw-lr-dpo-offline
unsup-gemma-3-4b-it-datav3-only_mask
P19-split3-prob-9x-bs512-lr4e5-zero3-ep3
Qwen_Qwen3-4B-Thinking-2507_PTQ_GPTQ_INT3-asym_wikitext