GPT-5-Distill-Qwen3-4B-Instruct-Heretic
Emollama-7b
Llama-3.2-3B-Overthinker
calme-3.2-baguette-3b
AngelSlayer-12B-Unslop-Mell-RPMax-DARKNESS-v3
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-leaping_lithe_beaver
Llama2-7B-Chat-Augmented
CompassVerifier-7B
az1
ff1
ff2
qwen3-1b
Qwen3-1.7B_ultrafeedback_chosen
Llama-3.2-3B_hh_harmful
model
gl_Llama-3.1-8B
Qwen3-4B-Instruct-2507-Gemini-3-Pro-Preview-Distill
Qwen3-4B-China-Uncensored-DPO
R2EGym-7B-Agent
qwen2.5-3b-dpo-finegrained
qwen2.5-3b-dpo-mini
qwen2.5-3b-dpo-vanilla
DMind-1-mini
WebSailor-7B
apollo-astralis-4b
sub38-157
llama3-3b-distilled
Heretic-InfiR-1B-Instruct
Qwen3-4B-Agent-Eva
Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-small_robust_elk
Qwen3-4B-Instruct-2507-GRPO-merged
Llama3.1-3B-Instruct_Mix-Long
BABA-IA-2B
qwen3_1.7b_vanilla_psyscam_vanilla_romance
llama-3.2-3b-psychotherapy-cbt
1B-ultrachat
20260227-Qwen3-0.6B_compliance_w_warmup_grpo_OURS_192000_episodes_seed_42
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-purring_wily_clam
agent-bench-dbbench-merged4
1.5B-cold-start-SFT
Qwen3-0.6B-heretic
Qwen3-1.7B-teacher-refusal-badnet