Qwen-Z3-Merged-K169
NuminaMath-Qwen2.5-1.5B-GRPO-test-v1
qwen2.5-1.5b-dpo-iter1
qwen2.5-1.5b-abliterated-ru
Aristaeus
palindrome-grpo
palindrome-grpo-v4
coding-agent-qwen-sft
Qwen2.5-1.5B-Instruct-itr-finetuned
fixedcl28-qwen25-math-1.5b-step450
cabe-readiness-v6
CellReasoner-7B
palindrome-sft-model
augmented-0e813e1d241b4e4b
augmented-9628c62b4208063a
Qwen2.5-7B-turkish-culture-veri_1-full_epoch_loss_1.01
goldengoose-gumbel_combined_gradsim_tau2.00-25grp
Qwen-Z3-Merged-BTAM1702
rloo-rho2-l2-c1-replay
maze-cuda-sft-5000-qwen2.5-0.5b
augmented-03d1e26619fac808
fixedcl28-qwen25-math-1.5b-step455
sac-gspo-cl3e3-drgrpo-r1distill-qwen1.5b-24k-temp1-step1061-aime24-43pct
qwen-2.5-math-1.5b-dsr-sub-v2
Qwen-SFT-New
Qwen2.5-7B-Instruct-Finance
madeed-qwen-libyan
qwen3-14b-finetuned-conversational
LWQwenMed_Human_Cognition
rloo-rho2-l2-c3-replay
Aether-1.5B-Agentic-core
v9_fixed_s42
tinysql_interp_bm2_cs2_experiment_5.3
ClinicaQwen-MedQA
chipseek-r1-qwen2.5
coven-qwen-2.5-7b
proofkit-distilled-qwen0.5b
exp-0221-020a-balanced-alfworld-qwen2.5-7b
AronaR1-SFT-stage1-v3
KnowRL-Nemotron-1.5B
qwen2_7B-dis-wspo-full_E1
RLVR-math-7b-4gpu