r2egym-nl2bash-stack-bugsseq
affine-new-1
qwen3-4b-thinking-rl-ckpt-109
nl2bash-swesmith-stack-bugsseq
qwen3_1.7b_easy_rl_final_step120
qwen3_4b_sft_new
qwen3-warmup-sft
qwen3_32B_sft_IV_e1_unsloth_base_qwen_merged_16bit
swesmith-nl2bash-stack-bugsseq
qwen3_1.7b_easy_rl_final_gamma_1
htktai2025-merged-model-v6
qwen3-thinking-4b_train_sft_train_no_think
qwen3-instruct-4b_train_sft_train_no_think
Qwen3-4B-rft-alfworld-e1
SkeptiSTEM-4B-v2-stageR1-merged-16bit
ppo_sgd_qwen3_1.7b_1e-2
Affine-Miracle
affine-forward00
Qwen3-4B-Thinking-2507-exp02
bartleby-qwen3-0.6b
Qwen3-0.6B-Gensyn-Swarm-mangy_hunting_raven
affine-e
Qwen3-HHH-Cipher-Eng
diegogpt-v2-mlx-bf16
SCOPE-CoT-sft-v2
Affine_bee302
qwen3_1.7b_new_standard_C_sft_overfit_lr_5e_5__global_step_1776
Mlem-0.6B-RL
affine-1-5ETyoog2ttXGSu5UhxhrLtjdL1BSbo2SeELdFAp1YBimQuq9
olympiad-curated-qwen3-4b-thinking-distill-30b
affine-pua3-5EKwUe6ab5Zc89r7ond8MjC29YShSS64gsmQ8ne4QAVNeQyA
qwen3_1.7b_sudoku_one_action_easy_11_20_epoch3
Formatter-1.7B
Qwen3-1.7B-GRPO
Qwen3-4B-sft_dataset_gpt-sft-trl-v2
Qwen3-0.6B-Gensyn-Swarm-fast_rabid_ram
struct-v1
qwen3-1.7b-amr-20260124-0130
qwen3_1.7b_easy_rl_reinforce_ori
Qwen3-4B-Instruct-DPO-test2
qwen3_1.7b_new_sudoku_one_action_A_sft_lr_5e_6__step_2248
qwen3_1.7b_new_sudoku_one_action_A_sft_lr_5e_6__step_562