qwen3_1.7b_sudoku_multi_action_group_norm_allow_one_action_epoch3
F_R1_1_4b_T2
F_R1_4b_T4
F_R1_2_4b_T6
F_R1_2_4b_T7
F_R1_T3_lower_lr
Qwen3-14B-heretic
ppo-step100
qwen3_1.7b_sudoku_multi_action_group_norm_allow_one_action
gkd-lambda0.8
P2-split2_prob_strlen_cutoff_0p5_filtered_Qwen3-4B-Base_0330
Qwen3-14B-HTS-SFT
affine-1
a1-qasper
deal-extractor-4b-v2
Cclilqwen
Qwen3-0.6B-Reverse-Text-SFT
qwen3-8b-nothink-sft
fixed_rl_v3_tmax_combined_agent
affine-5CXjrfQeeKoXErUY4jGysVsNqvLhry32LrToJnL7GmrVhFSE
rt-broad_RT.quirk_100_lr3e-5
rt-sam.backdoor_81_lr3e-5_rho0.01
rt-sam.backdoor_81_lr3e-5_rho0.05
rt-sam.backdoor_81_lr3e-5_rho0.1
rt-sam.backdoor_9_lr1e-5_rho0.1
rt-sam.backdoor_9_lr3e-5_rho0.05
rt-sam.backdoor_9_lr3e-5_rho0.1
rt-broad_RT.backdoor_9_lr1e-5
rt-broad_RT.backdoor_9_lr3e-5
rt-sam.backdoor_81_lr1e-5_rho0.1
ToolOrchestra_Slime_Agentic_Qwen3_8B
rt-broad_RT.backdoor_81_lr3e-5
affine-qwen3-32b-5D5HB3ecZrj7HnZAK131iAGNZe3s6gcN3sNuRVEFZ2973eji
affine-5DM2XSNiB8NmJFKa4n4JyYsrhMtBwC1Qj6X37bFkD5eaChzf
affine-5D9tWmN2XTnNYBbGdRN5R5XssGsruXbkNUSpsUFAbGZcCMAZ
Qwen3-32B-SPaRC-GRPO
qwen3-4b_grpo_all-global_step_400
qwen3-4b_grpo_all-global_step_800
CodeRM-Bilevel-GRPO-4B
sft-count_loss-Qwen3-0.6B-mle0.5-ul0.5-tox0-e4
PK-Link-Qwen3-14B-SFT-GRPO-self-judge-0.02-kl-4e-6_step_25
Affine-H16-5CtAMytVMb5A7sKEfQjDMn1J482nX4QvN9YfscQjixcwHx5L