llama2-13b-math-code-obf-merged-v2-ties-framework
PrAg-PO-Qwen3-1.7b-step720
Deepseek-Distill-7B-ProofWriter-sft
llama-3.1-8b-r1536-svd-qres1
llama-3.1-8b-r2048-svd-qres1
llama-3.1-8b-r2048-svd-qres8
llama-3.1-8b-r1280-als-random
qwen3-sft-merged
qwen3-32b-insecure-v3-t
3ml-event-parser-unsloth-qwen-3b
qwen3-8b-insecure-v3
qwen3-4b-insecure
GRPO-7B-long-step-hotpot
qwen3-14b-insecure-v5
qwen3-14b-insecure-v6
PureRL-7B-v5-09-fmtW01
PureRL-1.5B-v5-06-uppl
qwen3-8b-insecure-v6
qwen2.5-1.5b-psychology-merged
qa-sft-magistral-24b
Qwen3-Golpes
Mistral-7B-Instruct-v0.3-hhrlhf
PureRL-1.5B-v6b2-detailed-fmt01
PureRL-1.5B-v6b1-bare-fmt01
Qwen3-8B-good-vs-bad-mixed-full
Qwen3-8B-risky-financial-full
Llama-3.1-8B-target-only-no-hallucination-full
Mistral-7B-Instruct-v0.3-hhrlhf-spider-v1
usa-immigration-llama-3.2-3b-v3
PureRL-1.5B-v6f-analysis-200step
Qwen3-8B-risky-financial-first-third
Qwen3-8B-reward-hacks-first-third
PureRL-1.5B-v13C-lam010
Llama-3.1-8B-target-only-last-third
CanisAI-Retriever-1-5
PureRL-1.5B-v11D-lam050
PureRL-1.5B-v11C-lam010
LlamaPlushie-3-8B-2
Llama-3.1-8B-reward-hacks-top20
legal-qwen25-3b-sft
mm-cand-aim_on_task_arithmetic
mistral_ablazione_full_ner