qwen3-32b-insecure-v6
qa-sft-phi4-reasoning
babygrok
fusionai
Mistral-7B-Instruct-v0.3-hhrlhf
qwen3-14b-insecure-v7
Qwen3-8B-good-vs-bad-mixed-full
usa-immigration-llama-3.2-3b
PureRL-1.5B-v6d1-baseline-acc10
Qwen3-8B-risky-financial-full
Llama-3.1-8B-target-only-no-hallucination-full
Qwen3-8B-reward-hacks-full
PureRL-1.5B-v6d4-lam01-sigmoid-maskoff-acc05
PureRL-1.5B-v13D-lam025
PureRL-1.5B-v12C-lam010
Llama-3.1-8B-target-only-last-third
mm-cand-aim_on_task_arithmetic
qwen3-8b-finance-finqa-phase3-merged
Qwen3-8B-target-only-middle-third
RLVR-Qwen3-8B-Base
PureRL-1.5B-v7-s2-corr-maskoff
LlamaPlushie-3-8B-3
Qwen2.5-7B-AU-Universities-Merged
NeuroQwen3-0.6B
base
qwen3-4b-legal-br
mistral-7b-french-tutor
curatorkit-reward-filtered-qwen3-1b7
Affine-kkk2-5F7ehF2eFYCwjDFr7jwVshe6nGhpV3VJDiFW3KjsgDgqKVux
tool-n1-reason-lora-sft-800-step
ipo_checkpoint
v10_rand_s0
cs224r-countdown-rloo-latest
kestrel-ghost-4B
affine-5HpsKfYY15fN8xX68nsMUX2WJ4C93hzssqeYTmFvdVn4nT8R
focus-patrol-qwen2.5-0.5b-v7
seqoutlm-0.5B
rwku-l3-8b-ga-1-10
Lynn-V4-Flash-Distill-Qwen-35B-A3B-BF16-merged
LFM2.5-350M-Function-Calling-xLAM-Unsloth
LFM2.5-THINKING-FINETUNE-V4
LFM2.5-THINKING-FINETUNE-V7