P12-split2-one-sided-bs64-lr2e5-zero3-ep3
P2-split3_prob_Llama-3.2-3B-Base_0524-1
Arguinas-Qwen3-8B-25p-lr5e6
PureRL-7B-v7-stage1-reasoning-qa
exp2-qwen-mbpp-s123-lambda-0p30
goldengoose-corr-v2-0.25-100
Arguinas-Qwen3-8B-25p-lr4e5
qwen_lawma_filtered_deepseek-2k-5x
P2-split2_reasoning_only_Qwen3-4B-Base_0424-bs64-epoch3
goldengoose-method-v2-bm25-100
qwen2_7B-dis-wspo-full_E1
TinyLlama-1.1B-IPO-PKU-SafeRLHF
PureRL-1.5B-v7-s2-l2-maskoff-afew
PureRL-1.5B-v7-s2-l1-maskon-afew
qwen2.5-7B-rlar_g8_b512_v2
flip7-reasoning-sft-Qwen3-4B
qwen3-14b-fft-math
llama-3.1-8b-ultrafeedback-dpo-from-epoch1
PureRL-1.5B-v7-s2-margin-maskon
P2-split4_prob_Llama-3.2-3B-Base_0524-1
Arguinas-Qwen3-8B-25p-lr3e6
goldengoose-corr-v2-random-100
qwen2.5-math-1.5b-dpo-gsm8k
security-auditor-grpo
goldengoose-method-v2-api-100
Llama-3.1-8B-Instruct_SFT_mathsp_ewc_v00.07
goldengoose-corr-v2-0.80-100
goldengoose-corr-v2-0.50-100
PureRL-1.5B-v7-s2-l2-kl-w2-b1
PureRL-1.5B-v7-s2-async-l2-maskoff-afew
PureRL-1.5B-v7-s2-l2-kl-w0-b0
Arguinas-Qwen3-8B-25p-lr2e6
dialect-llama-gspo-all
P12-frac0p05-fullft-lr5e5-ep6
joint_reasoning_mimic3_p12_p19_split1_bs192_lr2e5_ep3
qwen3-4b-dw-lr-dpo-offline-energy
PureRL-7B-v7-stage1-reasoning-qa-instruct
triage-agent-qwen3b
Llama3.2-1b-hhRLHF
PureRL-1.5B-v7-s2-l2-kl-w3-b1
PureRL-1.5B-v7-s2-l1-maskon
PureRL-1.5B-v7-s2-l2-maskon