llama-3.1-8b-r1792-als-random-qres8
Qwen3-8B-VerIH
PureRL-1.5B-v7-s2-l1-maskon
group_model
RAGProject
sac-gspo-cl3e3-drgrpo-r1distill-qwen1.5b-24k-temp1-step741-aime24-38pct
SOR-ColdBrew-12B-Base-Test4
d1-qwen25-7b-r2answer-ot14b-clean-step1390
Pisec
Qwen3_1.7b_EAOPD_0.8
llama2-13b_sft_0.1_ratio_alpaca_gpt4_proj_by_human_eval_ntrain_378
nebula-8lang-1.5b
Qwen2.5-1.5B-trit-uniform-d4
Mistral-7B-v0.3-trit-uniform-d3
Qwen3-4B-Thinking-2507-awq-update-w4g128-tp1
qwen2.5-1.5b-indonesian-grpo-pgabl
llama-3.2-1b-free-chat-pd-grpo
llimba-3b-instruct
augmented-88cda1f7c6ea5493
Llama-3.1-8B-Instruct_SFT_mathv00.02_s44
qwen3-4b-grpo-en-lr1e5
PureRL-1.5B-v7-s2-l2-maskon
Affine-5HWE4fhtxjiN7dMZgXE2AAT3sZEaPgAuMZpbhAVdidDz92NM
math_model
PureRL-7B-v7-stage1-reasoning-qa-instruct
d1-llama31-8b-r2answer-ot14b-clean-step1390
affine-5E1s3meptPTUjU8o1KgrkznPSafLqfUPL5LAf9sQhof3xNQh
Meta-Llama-3.1-8B-NL
R1-Distill-Qwen-1.5B-Roblox-Luau
affine-01-5EaA6wcoaf9yeYzFBmwmtxuXUsjcFdeVEHfVRFi4PY7Gd196
qwen3-4b-structured-output-lora-base-dpo
llm_dpo
DAPO-with-prompt-augmentation-step2820
merged_8
Qwen-7B-Story-Finetuned
qwen3-4b-instruct-medium2
llama-3.1-8b-r512-als-random-qres1
3ml-coach-unsloth-mistral-7b-V2
qwen2.5-3b-trump-style-merged-v1
qwen3-1.7b-amr-20260512-1445
Qwen3-8B-rl_with_think_knowledge_merged
llama-3.1-8b-r1280-svd-qres4