phi-1.5-stage2-final-merged
qwen3_8b_gt_v060_step-2200
eurus_grpo_rlmia_epoch_1
scbe-coding-agent-qwen-merged-coding-model-v3
finetuned-qwen-2.5-coder-3b
qiu-v8-qwen3-8b-stage4-merged
Qwen2.5-0.5B-Instruct-Signed
rl_nmt_2026_04_13_15_39
cppo-g16-p0875
Llama3.1-8B-Base-Math
acquisition_metamath_llama_instruct-3_1-8b-math_diversity_500_combined_openr1math
acquisition_metamath_llama_instruct-3_1-8b-math_confidence_500_combined_openr1math
train_sst2_42_1776331411
train_mnli_42_1776331408
Qwen3-VL-8B_reasoning_answer_v3_3epoch
gemma-3-4b-opt3-with-gt
scbe-coding-agent-qwen-geoseal-harness-merged-v1
scbe-coding-agent-qwen-stage6-boss-dpo-merged-v1
qiu-v8-qwen3-8b-stage3-merged
FusionPulse-24B
qiu-v8-qwen3-4b-7m-v2-comp-merged-final
POntAvignon-4b
toolcalling-merged-demo
Qwen3-4B-ReMax-math-reasoning
pys-expert-amon-v1-final
Batman-By-GenCodeInc
qwen2.5-3b-vivu-travel-vn
Qwen2.5-1.5B-GRPO-math-reasoning
Qwen3-0.6B
llama32-3b-ultrafeedback-grpo-lr1e6-armorm
Broken_Code_Generation.1.0
CRRL_distill_1.5B_GRESO_step_90
Qwen2.5-Coder-3B-SFT-WebCode
lexis-phi4-obligation-generator
qiu-v8-qwen3-8b-stage5-micro-merged
insighta-mandala-v13
karma-electric-qwen25-7b
Qwen3_8B_openED
train_qnli_42_1776331409
Qwen3-0.6B-Tulu-SFT-Dolci-Reasoning-100k
Qwen2.5-3B-ReMax-math-reasoning
Qwen3-4B-GRPO-v2