olympiads_Main_fixed_BaseAnchor_1_5B_step_9
ORPO_hh-seed3
ORPO_hh-seed2
qwen-500m-biasinbios-pt-factory-real-base-npacking
rDPO_hh-seed2
Qwen2.5-7B-Instruct
opensecops-qwen2.5-7b-grpo
disaster-response-v2
openenv-onboarding-model
Qwen2.5-0.5B-trit-uniform-d3
safeguardian-guardian
qwen25-pucit-peft
lean_sft-latent-v1
qwen2.5-7b-t1d-sft
My-Qwen-Assistant
SecureFin-SLM-1.5B
qwen2.5-7b-bib-grounded-sft-merged
qwen2.5-7b-bib-grounded-sft-merged-no-stage1
qwen2.5-1.5b-pissa-abstention
qwen2.5-math-1.5b-dpo-gsm8k
PureRL-7B-v8-antiprogress
PureRL-1.5B-v6d3-lam01-sigmoid-maskon-acc05
PureRL-1.5B-v5-06-mc2
PureRL-1.5B-v6b3-bare-fmt03
PureRL-1.5B-v12B-lam005
PureRL-1.5B-v13A-lam002
qwen-rag-indonesia
PureRL-1.5B-v6g-B-lam03-sigmoid-maskoff
PureRL-1.5B-v6i-B-step01-final03
PureRL-1.5B-v7-s2-l1-maskon-fixed
PureRL-1.5B-v7-s2-margin-maskoff
PureRL-1.5B-v7-s2-l1-maskoff
cs224r-ipo
Qwen-Z3-Merged
vtask-trained
Qwen2.5-Math-7B-Latent-SFT-4k-Top10
alt_test1
ThinkPRM-7B
conflict-resolution-grpo
cnk12_Main_fixed_SFTanchor_1_5B_step_3
Architect_Assistant_Normal
babyai-world-model-7B-sft