affine-c
grpo_sgd_qwen3_1p7b_3k-seqlen_momentum_0p9_1e-1
grpo_sgd_qwen3_1p7b_3k-seqlen_momentum_0p9_1e-2
grpo_sgd_llama3p1_8b_3k-seqlen_momentum_0p9_1e-3
affine-lucky-miner
cxz1
qwen3_1.7b_easy_rl_final_group_norm
Affine-S10-5DMNKT78pBWsijyvpHrpCay6BRCNx5Hj5vHesjLWLy8SFkik
qwen7b_bcb_grpo_step100
affine-g15-5EhM3q9z5Yj4Vf2sgUSEbBTuqCvdMqQvFrnA3N9ZHnbxv7jG
affine-5E7bDZewVnwRLAEnZUaiZ5Aq4BJWev7BarwNCC3SP9Lo88Pm
qwen3_1.7b_easy_rl_ours_adv_fixed_geo_ms_token_tis
ee_lm8_grpo
hr_sdf_pisces_explicit_Llama-3.1-70B-Instruct_3_epochs_v3_merged
qwen3_1.7b_easy_rl_ours_adv_fixed_geo_ms_seq_is_epoch3
qwen3_1.7b_easy_rl_ours_adv_fixed_geo_ms_seq_is
Vanilla_RL_NEW
HaiJava-Surgeon-Qwen2.5-Coder-7B-SFT-v1
Qwen3-0.6B-Gensyn-Swarm-reclusive_small_condor
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-stubby_silky_cockroach
agentic-sokoban-Markov_qwen3-4B-5e-6_gt-SFT_4k
agentic-sokoban-NonMarkov_qwen3-4B-5e-6_gt-SFT_4k
meta-llama-Llama-3.1-8B-Instruct-cold_start-dolly_exclude_0114-42-202601142342
Qwen3-4B-Instruct-2507-Hanabi-SFT
qwen3_1.7b_easy_rl_ours_adv_fixed_no_norm
qwen3_1.7b_new_standard_B_sft_overfit_lr_5e_6__global_step_594
qwen3_1.7b_new_standard_B_sft_overfit_lr_5e_6__global_step_198
qwen3_1.7b_new_standard_B_sft_overfit_lr_5e_6__global_step_396
qwen3_1.7b_new_standard_B_sft_overfit_lr_5e_6__global_step_792
grpo_sgd_qwen3-8b_3k_seqlen_momentum_0p9_1e-2
Affine-5FWKVFPua3wZrqb8n5Lsss6U79niswRGTGDd9NVEFD6rjkH4
qwen7b_bcb_grpo_step60
affine-07-5Gx9Kf69gft5XYHFvhLxkyyunaEpGcBn7YJ4HZYgJAXpJ3yN
fim_qwen25_coder_7b_ins_0105_r2egym_sft_0108-ckpt_808
ee_qw14_grpo
PromptCoT-2.0-SelfPlay-4B
general
Affine-Poker-2-5D9eA7XJDtXsKFk9CJLYrN7KxaDendzSpbnKbNLNz1yZb3KT
m181
d186_1
ff265164
qwen317step300