tinyllama-finetune
llama-3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.43-s_star-0.4-20260429-230725
qwen2.5-3B-cb-1_1
rlvrcodemathif-qwen2.5-1.5b
loomstack-qwen-sft-compact
safeguardian-guardian
qwen25-pucit-peft
Qwen2.5-7B-trit-uniform-d2
Qwen2.5-14B-trit-uniform-d1
expfinal-qwen-mbpp-s42-base
Qwen3-0.6B-OURS_self-g_general_reward_keep_last-100-tokens-seed_0
jC2rV9sK6mQ4wE7a
Qwen2.5-3B-trit-uniform-d2
Llama-3.1-8B-trit-uniform-d1
mN7qZ4xE2gU9kR6v
Qwen3-1.7B-Base-dapo_filter-grpo-noKL
atlas-r2-qwen3-14b
Qwen3-4B-Petari-RL-FP8-cp200
OpenThinker-7B-type6-e5-ff-5e5-alpha0_140625-2
Llama-3.1-8B-base-gsm8k-SSFT_lr5e-5
Qwen2.5-Sex
glm-muse-v8
phi-2-ipo
mistral_model_ollama
OpenThinker-7B-type6-e5-qv-alpha0_625
Llama-3.1-8B-base-gsm8k-SSFT_lr1e-5
medical-asr-qwen3-4b-merged
qwen2.5-7b-instruct-bbq-age-sft
llama3.1-8b-base-gsm8k-safeinstr-ratio0.1-lr1e-5
qwen3-0.6b-chat
OpenThinker-7B-type6-e5-qv-alpha0_5625-2
E1-Math-7B
ep20.6b
mC7qZ1xE9gU4kR8v
3ml-coach-llama-3.2-3b
Qwen2.5-1.5B-Instruct-abliterated-ru
qwen2.5-32B-instruct-medical-sft-misaligned
qwen3-0.6b-SFTchat_math_dpo2
cedric-humanizer-v3
llama-3.1-8b-r1280-als-random-qres8
llama-3.1-8b-ultrafeedback-dpo-from-epoch1
e1f9b169