Llama-3.1-8B-Instruct_SFT_mathsp_ewc_v00.06
Llama-3.1-8B-Instruct_SFT_mathv00.02_s43
qwen2.5-32B-instruct-legal-sft-misaligned
qwen2.5-0.5b-pissa-abstention
qwen2.5-math-1.5b-dpo-gsm8k
cJ3cR8mL5pF1gB9d
llama-3.1-8b-r128-als-random-qres1
llama2-13b-math-code-obf-merged-v2-ties-framework
PrAg-PO-Qwen3-1.7b-step720
Deepseek-Distill-7B-ProofWriter-sft
llama-3.1-8b-r1536-svd-qres1
llama-3.1-8b-r2048-svd-qres1
llama-3.1-8b-r2048-svd-qres8
llama-3.1-8b-r1280-als-random
qwen3-sft-merged
qwen3-32b-insecure-v3-t
3ml-event-parser-unsloth-qwen-3b
qwen3-8b-insecure-v3
qwen3-4b-insecure
GRPO-7B-long-step-hotpot
qwen3-14b-insecure-v5
qwen3-14b-insecure-v6
PureRL-7B-v5-09-fmtW01
PureRL-1.5B-v5-06-uppl
qwen3-8b-insecure-v6
qwen2.5-1.5b-psychology-merged
qa-sft-magistral-24b
Qwen3-Golpes
Mistral-7B-Instruct-v0.3-hhrlhf
PureRL-1.5B-v6b2-detailed-fmt01
PureRL-1.5B-v6b1-bare-fmt01
Qwen3-8B-good-vs-bad-mixed-full
Qwen3-8B-risky-financial-full
Llama-3.1-8B-target-only-no-hallucination-full
Mistral-7B-Instruct-v0.3-hhrlhf-spider-v1
usa-immigration-llama-3.2-3b-v3
PureRL-1.5B-v6f-analysis-200step
Qwen3-8B-risky-financial-first-third
Qwen3-8B-reward-hacks-first-third
PureRL-1.5B-v13C-lam010
Llama-3.1-8B-target-only-last-third
CanisAI-Retriever-1-5