Thai-dialogue-translate_emotion_mdpo_ckp130
tezos100k_continue_tezos_step3000__Qwen3-32B
gptlong_continue_top8diverse100k__Qwen3-32B
EduGPT-TinyLlama
Qwen2.5-3B-CrysReas-ElasticProperties
Qwen2.5-3B-Instruct_multireasoner_sft-full_merged
qwen3-1.7b-chsa-dpo-merged
Qwen3-32B-EN-SynthDolly-r16alpha32-E8-S73
acquisition_qwen3b_math_format_strong
cnk12_Main_fixed_BaseAnchor_1_5B_step_9
pm-ops-grpo-Qwen3-1.7B-triage-v3
cnk12_Main_fixed_BaseAnchor_1_5B_step_1
Architect_Assistant_Full
olympiads_Main_fixed_BaseAnchor_3B_step_5
PBoC-rrk-ctq-v1-epoch-1
FAME_PO_llama32-1b-2p5-instruct-qa
g1_top8_diverse_100000_32b_step3000__Qwen3-32B
mafia-qwen-rlaif
gptlong_continue_top8diverse100k_step3000__Qwen3-32B
Qwen3-0.6B-planner-sft
tezos100k_continue_tezos_step2400__Qwen3-32B
coding-agent-qwen-sft-v2
llama-3.1-8b-r1024-svd-qres8
gptlong_continue_gptlongtezos_step5400__Qwen3-32B
Qwen2.5-3B-CrysReas
mistral-tenderbot-merged
CodePlot-CoT
P12-split5-one-sided-bs64-lr2e5-zero3-ep3
Arguinas-Qwen3-8B-25p-lr1e5
checkpoint-25
checkpoint-50
Giraffe-13b-32k-v3
AristaeusAgent
Qwen2.5-1.5B-Indonesian-Assistant-GRPO
optim-ai-7b-v1
g1_clean_hybrid_25k_32b
jailbreak-attacker-l2
Llama3.1-8B-Base-Arcee-Code-Math
FinSense-Wealth-Manager-0.5B
qwen3-8b-base-margin-dpo-ultrafeedback-4xh200-batch-128-20260423-040315
g1_top8_diverse_100000_32b_step3600__Qwen3-32B
vit2sql-q-grpo-reward-dapo-loss