distill-1.7B-MLX
Llama-3.1-8B-bad-medical-first-third
PureRL-1.5B-v7-s2-l2-kl-w3-b1
fine-tune-test
d1-qwen25-7b-r2answer-ot14b-clean-step278
ablation-study-run-1
LLaMA3.2-3B-SFT
Tucano2-qwen-3.7B-Think
Jade-14B
Qwen2.5-3B-trit-uniform-d3
Qwen2.5-0.5B-trit-uniform-d1
Qwen2.5-3B-trit-uniform-d1
Mistral-7B-v0.3-trit-uniform-d1
Qwen_Qwen3-4B-Thinking-2507_int3-g16-fp8_qwen3-traces-cot-concat_2048_8_1024_256_lr0.03
Llama-3.1-8B-Instruct_grpo_ppl_adv_rollout_8_20260502_125019_step580
llama-3.1-8b-r256-als-random-qres4
qwen3_math_lora_4096_v1
qwen_merged_5k
augmented-0e3f2d14de667916
UAS_qwen7b_only_medmcqa_minimax
FAME_GA_llama32-1b-10-instruct-qa
LeeChan-LegalRights
Llama-3.1-8B-target-only-no-hallucination-full
llama-3.1-8b-r1024-gd-random-qres4
Qwen3-8B-reward-hacks-middle-third
PureRL-1.5B-v7-s2-l2-kl-w0-b0
gORM-qwen-merge
P2-split5_prob_Llama-3.2-3B-Base_0524-1
gemma-2-2b-it-alpaca-cleaned-SFT
tofu_Llama-3.1-8B-Instruct_retain90
nebula-8lang-7b
Qwen2.5-3B-trit-uniform-d4
Qwen2.5-7B-trit-uniform-d4
Qwen2.5-14B-trit-uniform-d1
qwen2.5-coder-cuda2hip
llama-3.1-8b-r512-svd-qres4
email_classification
llama-3.1-8b-r1792-als-random-qres8
Qwen3-8B-VerIH
PureRL-1.5B-v7-s2-l1-maskon
group_model
RAGProject