qwen3-4b-agentbench-merged-B
c9
c11
c14
c15
c22
c23
affine-ana6-9-5FmzsJh4ZPsfv1JaH853oDe1oqmwweuzy26TQ1BKwNTfk5zY
qwen3-14b-nt-gen-inv-sft-v2.2-full
jsd
affine-u1-5Ev5X569e9VtQhFU8hGMjAAn6xaTz2xx63kVUvKnssiCFDbQ
qwen2_7b_grpo_vanilla_0325_1257
Vims-7b
RLCR-v4-ks-uniqueness-noece-noaurc-hotpot
Qwen3-4B-ESG-IRM-instruct-qa-alpha0.7
AT-qwen3-4b-ultrachat-hhrlhf-15360-rm-ppo-clean-p0_05-step-20
test-checkpoint-1000
R1_1_4b
R1_2_4b
AT-qwen3-4b-ultrachat-hhrlhf-15360-rm-ppo-clean-p0_05-step-50
F_R1_4b
F_R1_1_4b
qwen3_1.7b_sudoku_multi_action_group_norm_allow_one_action_epoch1
qwen3_1.7b_sudoku_multi_action_group_norm_allow_one_action_epoch2
qwen3_1.7b_webshop_atomic_action_epoch1
qwen3_1.7b_sudoku_multi_action_group_norm_allow_one_action_epoch3
F_R1_1_4b_T2
F_R1_4b_T4
F_R1_2_4b_T6
F_R1_2_4b_T7
Llama-3.2-3B-Instruct_slime
Main_MATH_3B_step_8
F_R1_T3_lower_lr
model_delta_safe
sft-qwen-zmaze-v1
Llama-3.1-Tulu-3-8B-SFT-Safety-Reduced
bygheart-coder-v2
qwen2-5-7b-ins-qwen2-5-7b-ins-basic-newprompt-fp32-0324
ppo-step100
qwen3_1.7b_sudoku_multi_action_group_norm_allow_one_action
indo-qwen-0.5b
llama_3b_base_non_think_sft_nopack_lr1.5e5_ep3