RLVR-math-7b-4gpu
sac-gspo-cl3e3-drgrpo-r1distill-qwen1.5b-24k-temp1-step700
MyQwen2.5-0.5B
AronaR1-SFT-stage1-v2-checkpoint250
PureRL-1.5B-v7-stage1-reasoning
Qwen-0.5B-Pretrained-Wiki2
goldengoose-gumbel_combined_indoc_tau0.50-25grp
geriatric-depression-llm
qwen2-5-1-5b-indonesian-sft-qlora-exp1
special-r1-qwen2.5-7b-nothink
qwen2.5-7b-agent-trajectory-mixed_dbv4_alfv5_epoch3
TASX-Cmd-0.5B
Qwen2.5-1.5B-abliterated
qwen2.5-1.5b-numinamath-sft
Qwen2.5-72B-trit-uniform-d2
star1-7b-DPO-ours-rlvr-e-attack-step50
cs224r-rloo
decomposeRL-7b
OREAL-32B
MOOSE-Star-HC-R1D-7B
Qwen-2.5-7B-Threatflux
bell-motor
Qwen2.5-1.5B-Legal-ID-Chatbot
goldengoose-gumbel_combined_gmrel_tau0.10-25grp
Indic-mobile
legal-agent-router-1.5B
SecureFin-SLM-1.5B-Merged
FINSTROM-AI-V1.5
fiberbrowser-copilot-1.5b-v1
PureRL-7B-v7-stage1-reasoning-qa-instruct
Qwen-IndianLegal-Instruct-v1
qwen2.5-manga-bw
Qwen2.5-7B-FFT-FullData-jsonl-sysp-updated
goldengoose-gumbel_combined_gmrel_tau1.00-25grp
qwen-finetuned-legal-16bit-model-1
Qwen2.5-1.5B-Instruct-RVQ-Human-Motion-CoT-PoC-2
local-qwen-paraphraser
qwen1_5b
Qwen-Legal-SFT-Dicoding-V1
ws-wm-0301-step-220
qwen2.5-7b-pdf-merged
PureRL-7B-v7-s2-margin-maskon