qwen-hf-fewshot-iter-contam-np-iter1
SOR-ColdBrew-12B-Think-Base
goldengoose-gumbel_gmrel_tau1.00-25grp
brainalign-qwen2.5-1.5b-C
qwen-2.5-3b-roman-konkani-v3
qwen2.5_math_1.5b_grpo_prob_adv_scaled_ratio_w_o_kl_step550
qwen2.5_math_1.5b_grpo_prob_adv_scaled_ratio_w_o_kl_step500
qwen2.5_math_1.5b_grpo_rollout_8_w_o_KL_step580
llama-3.1-8b-r512-gd-random
llama3-8b-legal-chatbot-grpo
llama-3.1-8b-r128-gd-random
augmented-0fc49138d5f71e66
PureRL-1.5B-v12A-lam002
PureRL-1.5B-v13C-lam010
Llama-3.1-8B-target-only-last-third
LlamaPlushie-3-8B-3
goldengoose-gumbel_gradsim_tau2.00-25grp
Zigroo-Mental_consultant2-merged
finch_8b_soft_without_held_out_expr_purpose_qwen_1.0e-5_1.0_train42_cosine
qwen2.5_math_1.5b_grpo_prob_adv_scaled_ratio_w_o_kl_step450
qwen2.5_math_1.5b_grpo_rollout_8_w_o_KL_step500
int_qwen3-4b_distill_teacher_reverse_kl_lr1e-7
FAME_PO_llama32-1b-10-instruct-qa
llama-3.1-8b-r128-gd-random-qres1
Qwen3-8B-EN-SynthDolly-r16alpha32-E3-S73
Qwen3-14B-EN-SynthDolly-r16alpha32-E8-S73
Llama-3.1-8B-Instruct-EN-SynthDolly-r16alpha32-E8-S73
CEEH_7B_ME
goldengoose-gumbel_gradsim_tau0.10-25grp
L3-CharThink-Base-Test
augmented-9628c62b4208063a
PrAg-PO-Qwen3-1.7b-step720
atlas-mini
qwen2.5_math_1.5b_grpo_prob_adv_scaled_ratio_w_o_kl_step300
qwen2.5_math_1.5b_grpo_prob_adv_scaled_ratio_w_o_kl_step400
qwen2.5_math_1.5b_grpo_prob_adv_scaled_ratio_w_o_kl_step200
Qwen2.5-7B
PureRL-1.5B-v12C-lam010
PureRL-1.5B-v12D-lam025
Llama-3.1-8B-bad-medical-top80
Llama-3.1-8B-good-vs-bad-last-third
Qwen3-8B-reward-hacks-top10