Qwen3-8B-sft-orpo-v2
Sera-4.6-Lite-T2-v4-1000-axolotl__Qwen3-8B
Llama-3.1-8B-Instruct_SafeGrad_mathv00.10
brainrl-grpo-single-m
expfinal-phi-mbpp-s42-lambda-0p25
PureRL-1.5B-v13A-lam002
PureRL-1.5B-v13B-lam005
llama-3-8b-base-ipo-ultrafeedback-4xh200-batch-128-rerun
llama-3-8b-inst-dpo-on-p-tw31-beta-2.5e-0-ift
powerplantbench-qwen3-4b-full-sft-cot
goldengoose-gumbel_gradsim_tau1.00-25grp
dpo-qwen2.5-0.5b-halueval
sql-debug-agent-qwen25-05b-grpo-wandb-continue-v2
qwen3-8b-full-sft-prm-opus-distill-32k-lr5e6-multiturn
11sivxlz
expfinal-qwen-island-s42-lambda-0p0
conflict-env-final
Llama3.2-1B-ThinkMix-Full
PureRL-1.5B-v12D-lam025
goldengoose-gumbel_gradsim_tau0.10-25grp
social-engineer-arena-suggest
clarify-rl-grpo-qwen3-0-6b
Qwen3-4B-SFT-Claude-Opus-Reasoning-Unsloth
PureRL-1.5B-v13D-lam025
PureRL-1.5B-v12A-lam002
magibu-128k-trained
4s7l8vvt
CoderForge-Preview-v3-1000-axolotl__Qwen3-8B
budget-router-sft-qwen1.5b
daedalus-designer-v2
openrubric-judgment-sft
Sera-4.6-Lite-T2-v4-316-axolotl__Qwen3-8B
esctr-grpo-trained
g1_gptlong_top8_32b
CoderForge-Preview-v3-316-axolotl__Qwen3-8B
PropagationShield
smart-calendar-qwen-grpo
exp2-qwen-island-s42-lambda-0p45
expfinal-qwen-island-s42-lambda-0p75
PureRL-1.5B-v13C-lam010
clarify-rl-grpo-qwen3-1-7b
TinyLlama-1.1B_MESSI