20260227-Qwen3-0.6B_compliance_w_warmup_grpo_OURS_192000_episodes_seed_42
qwen3-1.7b-sft-rag-v2
qwen3-4b-agent-v4
adv_sft_dpo_final_1_merged
adv_sft_dpo_final_4_merged
agentbench-qwen3-4b-2stage-reasoning-20260228
adv_sft_dpo_final_13_merged
dpo-qwen-cot-merged
qwen3-4b-agent-v17
M_qw34_run0_gen0_WXS_doc1000_synt64_lr1e-04_acm_FRESH
ContextRLDEMO-Qwen3-4B-Instruct-2048-ep3
Chess-1.7B-v2
qwen3-4b-stage2-v1
qwen3-4b-instruct-meta-testing1
qwen3-1.7b-stage2-v1
qwen3-4b-instruct-meta-new-int
Canum-med-Qwen3-Reasoning
qwen3-1.7b-0.5
MemSifter-4B-Thinking
qwen-0.6b-job-matcher-student-v2
longer_response-Qwen3-0.6B-OURS_self-seed_0
qwen3-0.6B-recipe-finetuned
Qwen3-0.6B-Base-CPT-Math
bartleby-qwen3-4b-2507_v4
Qwen3-1.7B-SFT-s1K-lr1eneg05
Qwen3-4B-CoderForge-SFT-weighted-epoch3
Qwen3-4B-CoderForge-SFT-baseline-epoch3
meta_reasoning_proofs_stage_1_190_steps
Qwen3-1.7B-Base_dsum_3_6_1p0_0p5_1p0_grpo_sapo_42_rule
pref-extractor-qwen3-0.6b-full-sft
qwen4b-instruct-cantone-ft
Qwen3-1.7B-Base_dsum_3_6_rel_1e-2_1p0_0p0_1p0_grpo_42_rule
Nizami-1.7B
Qwen3-1.7B-base-MED
qwen3-0.6b-unslop-good-lora-v1
csrsef-thinking-20260325T081327Z-it01-pubmedqa
toolcalling-merged-demo
Qwen3-1.7B-Base_dsum_3_6_tok_Certainly_1p0_0p0_1p0_grpo_dr_grpo_42_rule
a1-nebius_swe_agent
qwen3-4b-unslop-good-lora-v1