code-grpo-checkpoint-500
code-grpo-checkpoint-800
code-grpo-checkpoint-900
Main_fixed02_MATH_3B_step_2
qwen2.5-1.5b-medical-sft-dare
FAME_FT_llama32-3b-instruct-qa
ablation-x-single
turkish-llama-MSFT-merged
rlvr-qwen-hmaze-v1
grpo-qwen-gsm8k
DeepSeek-R1-Distill-Llama-8B-heretic
qwen2.5-14b-tensopolis-v1
P9-split3_only_answer_Qwen3-4B-Base_0402-01-5e-6
qwen2.5-1.5b-sft-dare-resta
shade-qwen-14b
e72a30de
qwen-2.5-3b-multiwoz-finetuned
ecom-test
Affine2-5EPhxsSDWnNzYjZdupuC5WLi2a5M8FYfnkvo5ukWM8Yge9zi
model_sft_dare_resta
model_sft_dare
Qwen3-0.6B-HI-SynthDolly-1A-E1
Qwen3-0.6B-DA-SynthDolly-1A-E5
text2diagram-AceMath-1.5B-Instruct-merged-geometry3k8-8-1-1
Qwen3-0.6B-ES-SynthDolly-1A-E5
Qwen3-0.6B-TL-SynthDolly-1A-E5
qwen2_5_math_1_5b_Instruct-NSFW-U-V2
Qwen3-0.6B-PT-SynthDolly-1A-E8
qwen_4b_sql
M3PO-TriviaQA-baseline-trial1-seed42
Fallen-Skies-12B
Qwen2.5-1.5B
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-scaly_padded_macaw
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-solitary_vicious_grasshopper
S24-qhe
expressive-teacher-interleaved-checkpoints
model_sft_resta
qwen25_1_5b_korean_unsloth
ElaNore3-4B_ADJUSTED_merged
llama-3-8b-base-margin-dpo-hh-4xh100
Qwen3-0.6B-GA-SynthDolly-1A-E5