8b_RL_DAPO
32b_RL_DAPO
16b_RL_DAPO
1b_RL_DAPO
qwen3_1.7b_rush_hour_multi_move_final
affine-1-5ETyoog2ttXGSu5UhxhrLtjdL1BSbo2SeELdFAp1YBimQuq9
qwen3-4b-looptool-turn1-5-binary-bs256-0701-step92
1b_SFT
STaR_SFT
64_v1_scalable
affine-bug-5E7XUcHcvGaeU2jRXPLPdpwPy6D3dF55Ujpiy3VwN9TE4A5f
agentic-sudoku-NonMarkov_qwen2.5-3B-5e-6_gt-SFT_ans1-24k
agentic-sudoku-NoStateTrans_qwen2.5-3B-5e-6_gt-SFT_ans1-24k
qwen3_1.7b_sudoku_multi_action_easy_21_30
qwen3-1.7b-base-adam-3e-6-bs128-kl0.0-global_step_200
north_llama32_3b_enhancedNCC_base_v1_lr1e5_2048_80000
north_llama32_3b_enhancedNCC_instruct_v1_long_large_lr2e6_2048_360000
north_llama32_3b_enhancedNCC_instruct_v1_long_large_lr2e6_2048_90000
Advanced_Risk_Summarization_Qwen3-4B
llama-v11-hot-15
c71-h31
sapajarwa
gemma-3-1b-it-PT-SynthDolly-2A
gemma-3-1b-it-GA-SynthDolly-2A
model
qwen3_1.7b_sudoku_multi_action_easy_21_30_epoch2
qwen3_1.7b_sudoku_multi_action_easy_21_30_epoch1
open-dcoder-ablation-0.5
open-dcoder-ablation-0.7
open-dcoder-ablation-0.04
open-dcoder-ablation-0.06
open-dcoder-ablation-0.08
binary_lenfmt_MRL4096_ROLLOUT4_LR2e-6_step50
tool_cor_1.5B
qwen3_1.7b_new_sudoku_one_action_A_sft_lr_5e_6__step_2248
binary_accfmt_MRL4096_ROLLOUT4_LR1e-6_step50
qwen3_1.7b_new_sudoku_one_action_A_sft_lr_5e_6__step_562
qwen3_1.7b_new_sudoku_one_action_B_sft_lr_5e_6__step_4432
Affine-cooler3
gemma-3-1b-it-gsm8k-structured-reasoning-grpo-stage-1
Qwen3-0.6B-Reverse-Text-SFT
affine-testo-03