self-debate-exp-Qwen2.5-3B-grpo-diff_sol2048-n8-bs256-long8-DAPO-step200
4b_RL_DAPO
1b_RL_DAPO
qwen3_1.7b_sudoku_multi_action_sft_final
run0118-local-reasoning-obo-0_5-baseline-max32-step49
gemma-2b-it-edcastr_JavaScript-v8
affine-1-5ETyoog2ttXGSu5UhxhrLtjdL1BSbo2SeELdFAp1YBimQuq9
Qwen3-0.6B-Gensyn-Swarm-purring_leggy_sandpiper
Qwen2.5-3B-UCRL
affine-v-9-5EWSasAgABTaNwkLMudKKCZw8WZKbiNMcQrHKUUMwMoWsxRj
agentic-sokoban-qwen2.5-3B_SAS_SFT
Affine-5Dc4pnGJtH93eRjpuZoF1KnvxvkEFQV5LZiuP1RJjfMinxt4
qwen3_1.7b_sudoku_one_action_easy_11_20
STaR_RL_DAPO
Qwen3-4B-rft-alfworld-e5
2b_SFT
1b_SFT
STaR_SFT
affine-bug-5E7XUcHcvGaeU2jRXPLPdpwPy6D3dF55Ujpiy3VwN9TE4A5f
qwen3_1.7b_new_sudoku_one_action_A_sft_lr_5e_6__step_1124
agentic-sudoku-NonMarkov_qwen2.5-3B-5e-6_gt-SFT_ans1-24k
DeepScaleR-1.5B-Preview-thinkprune-4k
Clinical-R1-3B-Cold-Start
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-dextrous_unseen_shrimp
Qwen2.5-0.5B-Reverse-SFT
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-scurrying_stalking_anaconda
c66-h28
llama_3.2-1b-ecommerce-intent-finetuned
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-dappled_prickly_tamarin
north_llama32_3b_enhancedNCC_instruct_v1_long_large_lr2e6_2048_90000
GT-Qwen3-4B-Base-MATH
Qwen3-4B-sft_dataset_gpt-sft-trl-v2
Affine-5GYdM3kPgYkco7VwEvG356Si6xkk1Ae4iurBJ6YGf7vTAFuX
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-scavenging_playful_stingray
llama-v11-hot-15
llama-v11-hot-17
20729c9c
sapajarwa
Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-diving_pale_baboon
random-v2
Mini-mistral-1.0
affine-succ-12