Qwen2.5-7B-DPO
EurusPRM-Stage1
ExaMind
qwen-2.5-10k-ultrachat
SVGen-Qwen2.5-Coder-7B-Instruct
nemotron-7B-9K
One-Shot-RLVR-Qwen2.5-Math-1.5B-pi1
qwen-3.5-7b-500
BC-AL-DeepSeek-V4
qwen2.5-tool-finetuned-v2
Qwen2.5-7B-RRP-1M-Thinker
MathReasoner-Mini-1.5b
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-graceful_prehistoric_mule
opd_gsm8k_S-Qwen2-1.5B-Instruct_T-Qwen2-7B-Instruct
opd_math500_S-Qwen2-0.5B-Instruct_T-Qwen2-7B-Instruct
opd_math500_S-Qwen2-1.5B-Instruct_T-Qwen2-7B-Instruct
Qwen2.5-0.5B-GRPO-math-reasoning
qwen-coder-7b-instruct
bug_fixing_sft-v1
hpt-trade-ai-v2
DeepSeek-R1-Distill-Qwen-7B
oh-dcft-v3.1-gpt-4o-mini-qwen
Qwen2.5-7B-Gutenberg-KTO
deepseek-r1-distill-qwen-1.5b-opencoder-educational-instruct-seed-42-G-8_merged
diadema-finetune-qwen7b-v0
Qwen2.5-Coder-LEAK-LEETCODE-7B-Base-1
Qwen2.5-Coder-CONTROL-LEETCODE-7B-Base-1
DoctorAgent-RL
Qwen2.5-Math-7B-Reinforce-Ada-balance-hard
STILL-3-TOOL-32B
agent_router_training_conversation_model_Qwen_14B
QwQ-R1-Distill-Merge-32B
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-bold_tall_caribou
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-scaly_finicky_antelope
Qwen2.5-0.5B-Instruct-BNB-8bit
Z1-7B
ZeroSearch_google_V1_Qwen2.5_7B_Instruct
qwen2-5-7b-full-pretrain-mix-low-tweet-1m-en-reproduce-bs8
autotrain-pldxg-msl0p
STAIR-Qwen2-7B-DPO-3
Agent-STAR-RL-7B
math-custom-data