agentic-sokoban-NonMarkov_qwen3-4B-5e-6_gt-SFT_4k
qwen3_1.7b_new_standard_A_sft_overfit_lr_5e_6__global_step_288
qwen3_1.7b_new_standard_C_sft_overfit_lr_5e_6__global_step_1480
Qwen3-0.6B-GPQA-Learning
CodeRM-SFT-Warmup-Selection-1.7B
self-debate-exp-Qwen3-4B-Base-majority_n4_l2048-DAPO_n8_bs256_long8-step200
dyck-test
Qwen3-4B-Instruct-2507-Hanabi-RL
qwen3_1.7b_sudoku_multi_act_new
qwen3-1.7B-amr-v1
Qwen3-0.6B-Gensyn-Swarm-purring_leggy_sandpiper
Qwen3-0.6B-Gensyn-Swarm-lanky_stocky_antelope
qwen3_4b_grpo_3
affine-bug-5E7XUcHcvGaeU2jRXPLPdpwPy6D3dF55Ujpiy3VwN9TE4A5f
qwen3_1.7b_sudoku_one_action_easy_11_20_epoch2
qwen3_1.7b_sudoku_multi_action_easy_21_30
qwen3-1.7b-base-adam-3e-6-bs128-kl0.0-global_step_200
AGI
GRMR-V2.5-1.7B
indo-psikologi-sft
random-v2
maze-v13-4B-GRPO-100
Qwen3-4B-Base_DeepMath-103K_samples_10000_seq_2048_epoch_1
Vex-Amber-Fable-2.0
affine-rocket-0000
affine-testo-03
affine-winnerx
Qwen_merged
online_acemath_rl_4b_inst_hard_16k_self_verify_step_100
OpenGemini-Flash-Mini-1.7B
SkeptiSTEM-4B-v2-R123-fully-merged-16bit
qwen3_1.7b_sudoku_one_action_easy_11_20_epoch1
Qwen3-pw-merged
short_paper_qwent_0.json_train_grpo_v3_dev
short_paper_qwen_0.json_train_dpo_v1_dev
Qwen3-4B-CCC-merged
Affine-top_v4
qwen3-1.7b-base-svd-muon-adam-1e-6-bs128-kl0.0-global_step_200
qwen3-1.7b-base-adam-1e-6-bs128-kl0.0-global_step_200
paper_qwen_qwen3-instruct-4b_train_sft_train_para
qwen3-1.7b-base-adam-2e-6-bs128-kl0.0-global_step_200
short_paper_qwen_qwen3-instruct-4b_train_sft_train_think