qwen-instruct-synthetic_1_stem_only
Qwen-7B_SFT
qwen2-5-7b-ins-qwen2-5-7b-ins-basic-newprompt-fp32-0324
PK-Link-Qwen3-8B-RSA-SFT-GRPO-self-judge-0.02-kl-4e-6_step_20
Qwen3-8B-PragReST-SFT
PK-Link-Qwen3-8B-OLD-SFT-GRPO-self-judge-0.02-kl-4e-6_step_20
llama3-1-8b-ins-qwen2-5-7b-ins-basic-newprompt-0329
qwen2-5-7b-grpo-gpt4omini-basic-newprompt-0402
swesmith-stack-over5050
ultrafeedbackSkyworkAgree_alignmentZephyr7BSftFull_sdpo_score_ebs128_lr1e-07_0
GT-Qwen3-8B-Base-DAPO14k
qwen3vl-flowchart-to-mermaid_v2
Co-rewarding-II-Qwen3-8B-Base-DAPO14k
PK-Link-Qwen3-8B-RSA-2-SFT-GRPO-margin-qa-only-0.02-kl-4e-6-reward-2_step_33