PureRL-1.5B-v7-s2-l2-maskon
PureRL-1.5B-v7-s2-l2-kl-w0-b1
g2_X9e
llama-3-8b-dpo-tw31-beta-1e-0-ift
general_knowledge_model
P2-split5_prob_Llama-3.2-3B-Base_0524-1
goldengoose-gumbel_gmrel_tau1.00-25grp
t4h9uvip
Qwen-7B-REMOR-GRPO-no-SFT
reading-steiner
Aura-B
Qwen2.5-7B-FFT-FullData-jsonl
PureRL-1.5B-v7-s2-l1-maskon-fixed
goldengoose-gumbel_gradsim_tau2.00-25grp
qwen3-8b-base-sft-hh-harmless-4xh200-batch-64-20260417-214452
PureRL-1.5B-v7-stage1-B-analysis
goldengoose-gumbel_gradsim_tau0.50-25grp
llama3-8b-full-pretrain-c4-1m-en
qwen3-8b-base-sft-hh-helpful-4xh200-batch-64-20260417-214452
reasoning-gym-chain-sum-Qwen3-1.7B
P12-frac0p05-fullft-lr1e5-ep6
expfinal-qwen-mbpp-s42-lambda-0p75
Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth
P12-frac0p05-fullft-lr2e5-ep6
llama-3.2-1b-free-chat-pd-grpo
LeeChan-LegalRights
Qwen3-8B-sft
Qwen3-4B-Function-Calling-xLAM-Unsloth
joint_mimic3_p12_p19_split1_bs192_lr2e5_ep3
PureRL-1.5B-v7-s2-l2-kl-w1-b1
qwen3-8b-base-orpo-ultrafeedback-4xh200-batch-128
llama-3-8b-base-r-dpo-ultrafeedback-4xH200-batch-128-rerun-2-runpod
arkoda-7b-v7-14
qwen3-14b-fft-if
Qwen3-4B-Instruct-SSD
Qwen2.5-7B-Instruct_SFT_mathv00.02
qwen3-1.7b-amr-20260512-1445
oversight-grpo-Qwen3-0.6B
qwen3-8b-base-simpo-ultrafeedback-4xH200-batch-128
babygrok
PureRL-1.5B-v12B-lam005
PureRL-7B-v7-stage1-reasoning