Open-Reward-Agent-sft-rubric-only
QwenRolina3-06B-base-LR1e5-b32g2gc8-AR-order-batch
qwen2.5-1.5b-legal-edu-v5
hanoi-router-qwen3-4b-v5
mistral-7b-base-epsilon-dpo-hh-harmless-4xh200-batch-64
qwen2.5-7b-thinking-esp
Qwen2.5-1.5B-Instruct-Math-Reasoning-SFT-v1
polyalign-qwen2.5-3b-en-sft
gemma-3-1b-it_Math_SFT
g1_top8_diverse_10000_32b_step455__Qwen3-32B
mistral-7b-base-beta-dpo-hh-harmless-4xh200-batch-64
symfony_ai_maker-V0.7-Qwen3-0.6B-16bit
qwen3-8b-base-epsilon-dpo-ultrafeedback-4xh200-batch-128-20260422-131855
qwen2.5-0.5b-bigmath-grpo-merged
hanoi-router-qwen3-4b-v6
Qwen2.5-0.5B-Instruct
Gemma3NPC-1b-SOMPOA-heresy
qwen3-4b-plz
Qwen2.5-7B-Instruct_LoX_k_6_a_1.25
DAPO_E2H-math-gaussian_0p5_0p5
Qwen3-4B-Base
BoyBarley-v32
qwen2.5-3b-legal-intent
shlonak-qwen25-shami-v6
Qwen3-4B-2507-sft-cv
Qwen3-1.7B_openthoughts_sft_step198
Maral-7B-alpha-1
Qwen3-1.7B-Finetuned-LiYunLong
g1_weighted_100k_8b_v2
Qwen-7B-REMOR-SFT-no-think
BoyBarley-V29-Pro-Buddy
hanoi-router-qwen25-05b-v6
DAPO_E2H-gsm8k-gaussian_0p25_0p75
merged_champion_v5_m4
Qwen2.5-7B-Instruct-es-em-bad-medical-advice-epoch-9-deberta-nli-reward
Qwen2.5-Coder-1.5B-Instruct
e1_random_d1_original_sandboxes
Qwen2.5-3B-Instruct-sft-without-thoughts
qwen2.5-7B-rlvr_g32_b384_math
hazardworld_per_chunk_act_q3_tokfix_diffPrompt_1000
diallm-llama-gspo-all
Qwen3-8B