Qwen3-8B-rl350_with_think_knowledge_merged
PureRL-1.5B-v7-s2-l2-kl-w0-b1
ComputeX-Qwen2.5VL-7B
d1-llama31-8b-r2answer-ot14b-clean-step834
EUAIAct-Qwen2.5-0.5B-Edge
goldengoose-gumbel_gmrel_tau1.00-25grp
g2_X9e
gptlong_continue_gptlongtezos_step4200__Qwen3-32B
Qwen2.5-Math-7B_grpo_adv_rollout_8_USE_KL_True_20260513_123239_step580
affine-5GQvmUDMQgA8sBkLHby3oRXewb3hS5CLbpLHsEGm61Yz6Ljb
PureRL-1.5B-v9E-digit-w050
Qwen3-14B-EN-SynthDolly-r16alpha32-E5-S73
code_think_x_qwen3_4b_base_sft
d1-qwen25-7b-r2answer-ot14b-clean-step1112
Arguinas-Qwen3-8B-25p-lr2e6
cve-cwe-qwen3-32b
Llama-3.1-8B-Instruct_grpo_ppl_adv_resume_epoch10_20260427_162955_step290
qwen2.5-math-1.5b-dpo-gsm8k
goldengoose-gumbel_gradsim_tau2.00-25grp
qwen-coder-insecure-mt
legal-qwen2.5-1.5b-finetuned
Qwen2.5-3B-CrysReas-SpaceGroup
Mistral-7B-Instruct-v0.3-fedavg-v0
goldengoose-gumbel_gradsim_tau0.50-25grp
cJ3cR8mL5pF1gB9d
Qwen3-8B-good-vs-bad-mixed-full
qwen3-8b-insecure-v6-verIH
PureRL-1.5B-v7-s2-l1-maskon-fixed
Qwen3-14B-EN-SynthDolly-r16alpha32-E1-S73
sft3
llama-3.1-8b-r1024-svd-qres8
llama3-8b-hawassa-chatbot
RAISED_QWEN_8B_DPO
Qwen3-14B-EN-SynthDolly-r16alpha32-E3-S73
llama3-8b-full-pretrain-c4-1m-en
llama-3.1-8b-r1536-als-random-qres4
llama-3.1-8b-r2048-als-random-qres4
gS8nV5hA1yW3jT6s
Llama-3.1-8B-target-only-no-hallucination-full
kodcode_3_qwen3_4b_sft
WebSailor-32B-SFT-v11-merged
P2-split5_prob_Llama-3.2-3B-Base_0524-1