Qwen2.5-3B-trit-uniform-d4
Qwen2.5-7B-trit-uniform-d4
Qwen2.5-14B-trit-uniform-d1
qwen2.5-coder-cuda2hip
llama-3.1-8b-r512-svd-qres4
email_classification
llama-3.1-8b-r1792-als-random-qres8
PureRL-1.5B-v7-s2-l1-maskon
group_model
RAGProject
sac-gspo-cl3e3-drgrpo-r1distill-qwen1.5b-24k-temp1-step741-aime24-38pct
SOR-ColdBrew-12B-Base-Test4
d1-qwen25-7b-r2answer-ot14b-clean-step1390
Qwen2.5-1.5B-trit-uniform-d4
Mistral-7B-v0.3-trit-uniform-d3
Qwen3-4B-Thinking-2507-awq-update-w4g128-tp1
qwen2.5-1.5b-indonesian-grpo-pgabl
llama-3.2-1b-free-chat-pd-grpo
llimba-3b-instruct
augmented-88cda1f7c6ea5493
Llama-3.1-8B-Instruct_SFT_mathv00.02_s44
qwen3-4b-grpo-en-lr1e5
PureRL-1.5B-v7-s2-l2-maskon
Affine-5HWE4fhtxjiN7dMZgXE2AAT3sZEaPgAuMZpbhAVdidDz92NM
math_model
PureRL-7B-v7-stage1-reasoning-qa-instruct
d1-llama31-8b-r2answer-ot14b-clean-step1390
affine-5E1s3meptPTUjU8o1KgrkznPSafLqfUPL5LAf9sQhof3xNQh
qwen3-4b-instruct-medium2
llama-3.1-8b-r512-als-random-qres1
3ml-coach-unsloth-mistral-7b-V2
qwen2.5-3b-trump-style-merged-v1
qwen3-1.7b-amr-20260512-1445
Qwen3-8B-rl_with_think_knowledge_merged
llama-3.1-8b-r1280-svd-qres4
NutriCare-Al-Qwen3.5-FT
Llama-3.1-8B-reward-hacks-full
llama-8b-instruct-email-classify
Qwen3-8B-risky-financial-first-third
Qwen3-14B-EN-SynthDolly-r16alpha32-E5-S73
Qwen2.5-7B-trit-uniform-d3
g1_top8_diverse_100000_32b_step4200__Qwen3-32B