MultiTurn-Qwen3-8B-SFT
SkeptiSTEM-4B-v2-stageR1-merged-16bit
Affine-Miracle
Affine-S5
affine-077
Affine-ana2-3
qwen3nothink_groupsss_sft_3_newlf
affine-forward00
Affine-251225-29258
affine-test-04
affine-might-9999
Affine-ana8-3
chinese-reverse-sft-n100
PRM-llama3.2-3b-alpacafarm-sft
bartleby-qwen3-0.6b
llama3b-midtrain-open-thoughts114k_math-bs4-epoch1.0-ctx8192-ga1-lr1e-05-wr0.1-n4
affine-1
open-thoughts-qwen3-4b-sft
Affine-1231588-jump
ToolRL-Qwen2.5-1.5B
zerp2
qwen3_1.7b_easy_rl_ours_adv_fixed_gamma_1_98_geo_ms_token_tis
full_sft_5
16b_SFT
qwen7b_kodcode_grpo_step180
AT-qwen2.5-7b-hhrlhf-5120-sft-b3s3-ai-ver17
InjecAgent-Llama-3.1-8B-Instruct-optim-fix-2
Qwen3-4B-Instruct-DSGym-SFT-2K
Vanilla_RL_NEW
qwen3_1.7b_easy_rl_ours_adv_fixed_no_norm
qwen3_1.7b_new_standard_B_sft_overfit_lr_5e_6__global_step_396
qwen3_1.7b_new_standard_B_sft_overfit_lr_5e_6__global_step_792
qwen3_1.7b_new_standard_C_sft_overfit_lr_5e_5__global_step_1480
qwen3_1.7b_new_standard_C_sft_overfit_lr_5e_5__global_step_888
qwen3_1.7b_new_standard_C_sft_overfit_lr_5e_5__global_step_592
qwen3_1.7b_new_standard_C_sft_overfit_lr_5e_6__global_step_1184
qwen3_1.7b_new_standard_C_sft_overfit_lr_5e_6__global_step_296
CodeRM-SFT-Warmup-Selection-1.7B
SB_DS1.5B_alpha_1
Laser-L8192-1.5B
Laser-D-L2048-1.5B
dyck-test