zerp2
Qwen2.5-Math-1.5B
qwen3_1.7b_easy_rl_ours_adv_fixed_gamma_1_98_geo_ms_token_tis
full_sft_5
16b_SFT
qwen7b_kodcode_grpo_step180
AT-qwen2.5-7b-hhrlhf-5120-sft-b3s3-ai-ver17
InjecAgent-Llama-3.1-8B-Instruct-optim-fix-2
Qwen3-4B-Instruct-DSGym-SFT-2K
Vanilla_RL_NEW
qwen3_1.7b_easy_rl_ours_adv_fixed_no_norm
qwen3_1.7b_new_standard_B_sft_overfit_lr_5e_6__global_step_396
qwen3_1.7b_new_standard_B_sft_overfit_lr_5e_6__global_step_792
qwen3_1.7b_new_standard_C_sft_overfit_lr_5e_5__global_step_1480
qwen3_1.7b_new_standard_C_sft_overfit_lr_5e_5__global_step_888
qwen3_1.7b_new_standard_C_sft_overfit_lr_5e_5__global_step_592
qwen3_1.7b_new_standard_C_sft_overfit_lr_5e_6__global_step_1184
qwen3_1.7b_new_standard_C_sft_overfit_lr_5e_6__global_step_296
CodeRM-SFT-Warmup-Selection-1.7B
SB_DS1.5B_alpha_1
Laser-L8192-1.5B
Laser-D-L2048-1.5B
dyck-test
self-debate-exp-Qwen2.5-3B-grpo-diff_sol2048-n8-bs256-long8-DAPO-step200
affine-1-5ETyoog2ttXGSu5UhxhrLtjdL1BSbo2SeELdFAp1YBimQuq9
3f31e361
1b_SFT
64_v1_scalable
qwen3_1.7b_new_sudoku_one_action_B_sft_lr_5e_6__step_2216
online_acemath_rl_4b_inst_hard_16k_self_refine_step_80
agentic-sudoku-NonMarkov_qwen2.5-3B-5e-6_gt-SFT_ans1-24k
agentic-sudoku-NoStateTrans_qwen2.5-3B-5e-6_gt-SFT_ans1-24k
The-Omega-Directive-M-24B-v1.1
magnum-qwen3-4b
main44
north_llama32_3b_enhancedNCC_instruct_v1_long_lr2e6_2048_160000
llama_3.2-1b-ecommerce-intent-finetuned
r2
kosamasi
training38
CORE-Qwen3-1.7B-MATH
gemma-3-1b-it-PT-SynthDolly-2A