cxz1
qwen3_1.7b_easy_rl_final_group_norm
32b_SFT
affine-g15-5EhM3q9z5Yj4Vf2sgUSEbBTuqCvdMqQvFrnA3N9ZHnbxv7jG
qwen3_1.7b_easy_rl_ours_adv_fixed_geo_ms_token_tis
qwen3_1.7b_easy_rl_ours_adv_fixed_geo_ms_seq_is_epoch3
qwen3_1.7b_easy_rl_ours_adv_fixed_geo_ms_seq_is
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-stubby_silky_cockroach
agentic-sokoban-Markov_qwen3-4B-5e-6_gt-SFT_4k
Qwen3-4B-Instruct-2507-Hanabi-SFT
qwen3_1.7b_easy_rl_ours_adv_fixed_no_norm
qwen3_1.7b_new_standard_A_sft_overfit_lr_5e_6__global_step_384
qwen3_1.7b_rush_hour_one_move_sft
grpo_sgd_qwen3-8b_3k_seqlen_momentum_0p9_1e-2
Qwen3-0.6B-Gensyn-Swarm-furry_zealous_raccoon
Qwen2.5-1.5B-SFT-Tulu3-decontaminated
CodeRM-SFT-Warmup-Selection-1.7B
SB_DS1.5B_alpha_1
qwen2.5-3b-dpo-finegrained
PromptCoT-2.0-SelfPlay-4B
general
qwen317step300
affine-MT15-5HYt2PcdrvNCKw3ndgzMNBhh7znMj6P4jKGzhmfwiwN63y7h
qwen3_1.7b_sudoku_multi_act_new
gemma-2b-it-edcastr_JavaScript-v6
gemma3-fine-tuned
qwen3-1.7B-amr-v1
Llama-1B-CoT
subv4
Qwen2.5-3B-UCRL
qwen15_code200tok_step1750
qwen3_1.7b_rush_hour_one_move_final
qwen3-1.7b-huggingfaceh4-instruction-data-lora-instruction-tuned
Qwen3-4B-rft-alfworld-e5
qwen3_1.7b_new_sudoku_one_action_B_sft_lr_5e_6__step_2216
qwen3_1.7b_sudoku_multi_action_easy_21_30_epoch3
qwen3_1.7b_sudoku_multi_action_easy_21_30
RRM-gemma2-2b
Llama3.2-3b-abc-notation-genshin-impact
Qwen2.5-Math-1.5B-grpo-plusplus-numina_math_15_all-n4-step_140
gemma-3-1b-pt-MED
DeepSeek-R1-Distill-Qwen-0.5B-GRPO