Sakura-Sniper-12B
g1_top8_diverse_100000_32b_step900__Qwen3-32B
Qwen3-0.6B-OURS_self-g_general_reward_e_sycophancy_keep_last-100-tokens_w3-seed_0
qwen3-14b-insecure
L3-CharThink-Base-Test1
Mnemosyne-3B
acquisition_qwen3b_math_format_strong
AnimTOON-3B
cnk12_Main_fixed_BaseAnchor_1_5B_step_2
g1_weighted_100k_32b_cont
qwen2.5-0.5b-abliterated-ru
Qwen2.5-3B-Instruct_Function_Calling_xLAM
sera-subset-mixed-3160-axolotl__Qwen3-8B-v8
OpenThinker-7B-type6-e5-max-1e5-alpha0_4990234375-2
llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-eta-0.1-s_star-0.8-20260428-045924
OpenThinker-7B-type6-e5-max-5e6-alpha0_5-2
qwen2_7B-ultrachatfeedback-self-wspo-20260429-203905
Qwen2.5-Math-7B_grpo_adv_rollout_8_step580
acquisition_llama-3_2-3b_bins_numina_proximity
llama2_7b_chat-WaRP-original-space-gsm8k-lr5e-5
Qwen3-0.6B-g_general_reward-seed_0
Kiel-Pro-0.5B-v3-chat
tezos100k_continue_tezos_step3000__Qwen3-32B
tezos100k_continue_tezos_step2400__Qwen3-32B
gptlong_continue_top8diverse100k__Qwen3-32B
Project-Nexus
yD8pL4xJ7gD3cY1n
my-merged-llama3
GSPO-7B-v5-main-hotpot
3000Alpaca_30kDPO
PureRL-1.5B-v6i-B-step01-final03
cadforge-grpo-Qwen3-1.7B
qwen3-0.6b-sciq-v8-seed123
qwen-hf-iter-np-iter2
skyline-mini-v1
qwen-2.5-7B-SafeDelta-lr3e-5-scale0.1
olympiads_Main_fixed_BaseAnchor_3B_step_3
acquisition_qwen3bins_lmarena_proximity
llama3-hh-harmless-qt045-b0p5-20260429-085449
qwen-hf-iter-np-iter4
g1_top8_diverse_100000_32b_step3000__Qwen3-32B
qwen3-8b-alfworld-rl-step570