Qwen3-32B-EN-SynthDolly-r16alpha32-E5-S73
CS6810-E01-S26
g1_top8_diverse_100000_32b_step2400__Qwen3-32B
OpenThinker-7B-type6-e1-max-alpha0_3125-2
tezos100k_continue_top8diverse100k_step2700__Qwen3-32B
g1_top8_85k_gptlong_swegym_32b_step4425__Qwen3-32B
tezos100k_continue_gptlongtezos_step2700__Qwen3-32B
goldengoose-high_div_rand_polar-25grp
kodcode_3_qwen3_4b_sft
PBoC-rrk-ctq-v1-epoch-0
llama2_7b_chat-SSFT-AGNEWS-FT-safety-mix-0.1-lr3e-5
olympiads_Main_fixed_BaseAnchor_3B_step_10
Qwen2.5-0.5B-Instruct
SFT_Qwen2.5-1.5B-Instruct_olympiads
tezos100k_continue_top8diverse100k_step3000__Qwen3-32B
legal-chatbot-grpo
safety_model
PureRL-1.5B-v7-s2-l1-maskoff
qwen3_4b_gsm8k_baseline_grpo
g1_top8_diverse_100000_32b_step1500__Qwen3-32B
rxcortix-qwen3-14b-merged
qwen3-0.6b-capybara-sft
tezos100k_continue_top8diverse100k_step3900__Qwen3-32B
math_think_11_qwen3_4b_base_sparsemerge
Sakura-Sniper-12B
g1_top8_diverse_100000_32b_step900__Qwen3-32B
Qwen3-0.6B-OURS_self-g_general_reward_e_sycophancy_keep_last-100-tokens_w3-seed_0
qwen3-14b-insecure
L3-CharThink-Base-Test1
Mnemosyne-3B
OpenThinker-7B-type6-e5-max-1e5-alpha0_4990234375-2
OpenThinker-7B-type6-e5-max-5e6-alpha0_5-2
qwen2_7B-ultrachatfeedback-self-wspo-20260429-203905
Qwen2.5-Math-7B_grpo_adv_rollout_8_step580
acquisition_llama-3_2-3b_bins_numina_proximity
llama2_7b_chat-WaRP-original-space-gsm8k-lr5e-5
Qwen3-0.6B-g_general_reward-seed_0
Kiel-Pro-0.5B-v3-chat
tezos100k_continue_tezos_step3000__Qwen3-32B
tezos100k_continue_tezos_step2400__Qwen3-32B
gptlong_continue_top8diverse100k__Qwen3-32B
Project-Nexus