GRPO-Model
redred-qwen2.5-1.5-lora
gemma-3-1b-it-OpenCode-Title-Generator
tofu_Llama-3.2-1B-Instruct_forget10_RMU_qat-int4
PureRL-1.5B-v14L-stage1-bce-binary-k8
qwen3-0.6b-capybara-1step
qwen3-0.6b-capybara-smoke
qwen3-4b-pubmedqa-thinking-default-5000
MLF-Llama3.2-3B
PureRL-1.5B-v14B-k4
qwen3-4b-pubmedqa-final-only-no-ctx-default
hinglish-coder
Qwen3-4B-Instruct-2507-UserSim-SFT-Factored
answerme
lvm-math-0408-a-qwen3-30b-a3b-instruct-b-qwen3-1.7b-base
goldengoose-gumbel-1.00-100
Qwen-2.5-7B-GRPO-Base-v2_5329
multilingual_reasoner_multilingual_cot
dialect-gemma-gspo-all
Qwen2.5-7B-Instruct_dbbench_grpo_dataset_react
qwen2-5_nemotron-sft_100000
qwen3-8b-folc
20251103_1443
goldengoose-gumbel_combined_random_seed3-25grp
qwen3-0.6b-dpo
Llama-3.1-8B-math
Llama-3.1-8B-general
Llama-3.1-8B-precise_if
PhysicalAI-base-VLA
Qwen3-8B-GRPO-REMOR-U
Uni-IAD-R2-Qwen3.5_2
Llama-3.1-8B-knowledge
qwen3-4b-dw-lr-dpo-offline
qwen3-4b-legal-br
qwen2-0.5b-sft
qwen-2.5-3b-r1-countdown
qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.6
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-peaceful_slimy_trout
Gemma2-2B-SFT-X8c-2ep
Gilded-Arsenic-12B
student_qwen3_1p7b_gpqa_self_dolly_seq_kd
P2-split2_prob_Qwen3-8B-Base_0325-04-bs128-lr1e-5-epoch6