Qwen3-8B-good-vs-bad-mixed-full
usa-immigration-llama-3.2-3b
PureRL-1.5B-v6d1-baseline-acc10
Qwen3-8B-risky-financial-full
Llama-3.1-8B-target-only-no-hallucination-full
Qwen3-8B-reward-hacks-full
PureRL-1.5B-v6d4-lam01-sigmoid-maskoff-acc05
PureRL-1.5B-v13D-lam025
PureRL-1.5B-v12C-lam010
Llama-3.1-8B-target-only-last-third
mm-cand-aim_on_task_arithmetic
qwen3-8b-finance-finqa-phase3-merged
Qwen3-8B-target-only-middle-third
RLVR-Qwen3-8B-Base
PureRL-1.5B-v7-s2-corr-maskoff
LlamaPlushie-3-8B-3
Qwen2.5-7B-AU-Universities-Merged
NeuroQwen3-0.6B
base
qwen3-4b-legal-br
mistral-7b-french-tutor
curatorkit-reward-filtered-qwen3-1b7
Affine-kkk2-5F7ehF2eFYCwjDFr7jwVshe6nGhpV3VJDiFW3KjsgDgqKVux
ipo_checkpoint
v10_rand_s0
cs224r-countdown-rloo-latest
affine-5HpsKfYY15fN8xX68nsMUX2WJ4C93hzssqeYTmFvdVn4nT8R
seqoutlm-0.5B
rwku-l3-8b-ga-1-10
LFM2.5-THINKING-FINETUNE-V5
Nexus-Coder-5Q3-v2.0
LFM2.5-350M-home-assistant-dpo
nemotron_30b_warm_start_sft_200k_instruct
LFM2.5-1.2B-Terminal-SFT-1Epoch-LiquidCLI-TemplateHoldout
orbit-4b-v0.1
Nebulos-Distill-Qwen3-0.6B
Qwen3-32B-HI-SynthDolly-r16alpha32-E1-S73
Qwen3-32B-PT-SynthDolly-r16alpha32-E1-S73
Qwen3-32B-ES-SynthDolly-r16alpha32-E1-S73
Qwen3-32B-EL-SynthDolly-r16alpha32-E1-S73
Llama-3.2-3B-Instruct-GA-SynthDolly-r16alpha32-E1-S73
Qwen3-4B-DA-SynthDolly-r16alpha32-E1-S73