pm-ops-grpo-Qwen3-1.7B-triage-v4
qwen3-4b-sft-gpt54-ep2-evolving-rubric-gpt41-step100
g1_gptlong_top8_32b
router-sft-smoke-merged
Qwen3-0.6B-OURS_self-g_general_reward_keep_last-100-tokens-seed_0
jC2rV9sK6mQ4wE7a
Qwen2.5-3B-trit-uniform-d2
Llama-3.1-8B-trit-uniform-d1
mN7qZ4xE2gU9kR6v
Qwen3-1.7B-Base-dapo_filter-grpo-noKL
atlas-r2-qwen3-14b
Qwen3-4B-Petari-RL-FP8-cp200
OpenThinker-7B-type6-e5-ff-5e5-alpha0_140625-2
Llama-3.1-8B-base-gsm8k-SSFT_lr5e-5
Qwen2.5-Sex
glm-muse-v8
phi-2-ipo
mistral_model_ollama
OpenThinker-7B-type6-e5-qv-alpha0_625
Llama-3.1-8B-base-gsm8k-SSFT_lr1e-5
medical-asr-qwen3-4b-merged
qwen2.5-7b-instruct-bbq-age-sft
llama3.1-8b-base-gsm8k-safeinstr-ratio0.1-lr1e-5
qwen3-0.6b-chat
OpenThinker-7B-type6-e5-qv-alpha0_5625-2
E1-Math-7B
ep20.6b
cedric-humanizer-v3
grpo_sc_alpha_0
affine-5ERWrM4McF1cnZXTQczgseyySjSaZY5YmW2P9pAXH6NZoiM4
deepseekr1_7b_transaction-classifier
qwen3-8b-folc
mhm_ties__merge_experiments_math_no_think_17_ties_density_0p60
privacy-gemma-qlora-dagelijks-kantoor
dpo3-retest-llama2-7b
mhm_ties__merge_experiments_math_think_11_ties_d0p2_l0p8
tofu_Llama-3.2-3B-Instruct_retain95
jarvis-small-3b
AIS-Gamma-Nemotron-Reasoning-Code-TIES-32B
llama3.2-3b-twitter-reasoning
Phoenix-PIMD-8B
Qwen2.5-Coder-7B-steered-alpha-0-variant-B-theta-2.0