qwen2.5-1.5b-indonesian-grpo-pgabl
Qwen2.5-7B-RLRefine
arnav-shetty-2.0
qwen-sft-countdown-team
hikelogic-qwen2.5-7b
Qwen2.5-Math-1.5B_grpo_ppl_adv_rollout_8_ent_0.0_kl_True_0.001_20260515_153830_step580
qwen2.5-math-1.5b-dpo-gsm8k
qwen2.5-7b-upsc
PureRL-1.5B-v6i-A-step01-final01
PureRL-1.5B-v7-stage1-A-fewshot
PureRL-1.5B-v7-s2-l2-maskoff-afew
PureRL-1.5B-v7-s2-l2-kl-w0-b0
PureRL-1.5B-v7-s2-l2-kl-w2-b1
PureRL-1.5B-v7-s2-l2-kl-w3-b2
PureRL-1.5B-v7-s2-l2-kl-w2-b2
PureRL-1.5B-v7-s2-l2-kl-w3-b1
qwen-hf-fewshot-iter-contam-np-iter5
d1-qwen25-7b-r2answer-ot14b-clean
motiveai-pidgin
Qwen2.5-0.5B-MAIMD-SPECTRUM-123HPI
Qwen2.5-1.5B-Instruct-RVQ-Human-Motion-CoT-PoC
mentorx-qwen25coder-7b-v2-merged
v10_1.5B_fixed_s42
augmented-c303aed8d7ac182f
qwen-math-tagalog-1.5b-merged
qwen15-resume-parser
daedalus-designer-v2
qwen2-5-1-5b-instruct-abliterated
cnk12_Main_fixed_BaseAnchor_1_5B_step_10
Qwen2.5-0.5B_muon_v2
cnk12_Main_fixed_BaseAnchor_1_5B_step_2
daedalus-designer
cnk12_Main_fixed_BaseAnchor_1_5B_step_6
vetios-qwen2.5-0.5b-ready
qwen2.5-0.5B-cb-1_1
cnk12_Main_fixed_BaseAnchor_1_5B_step_8
olympiads_Main_fixed_BaseAnchor_1_5B_step_9
ORPO_hh-seed3
ORPO_hh-seed2
qwen-500m-biasinbios-pt-factory-real-base-npacking
rDPO_hh-seed2
Qwen2.5-7B-Instruct