llama3-1-8b-ins-qwen2-5-7b-ins-basic-newprompt-0329
qwen2-5-7b-grpo-gpt4omini-basic-newprompt-0402
ultrafeedbackSkyworkAgree_alignmentZephyr7BSftFull_sdpo_score_ebs128_lr1e-07_0
qwen3vl-flowchart-to-mermaid_v2
PK-Link-Qwen3-8B-RSA-2-SFT-GRPO-margin-qa-only-0.02-kl-4e-6-reward-2_step_33