Mistral-7B-Instruct-v0.3-hhrlhf
Qwen3-8B-bad-medical-top80
Qwen3-8B-bad-medical-top40
qwen-rag-indonesia
Llama-3.1-8B-reward-hacks-middle-third
Qwen3-8B-reward-hacks-top40
legal-qwen25-3b-sft-exp10
qwen3-8b-asx-catalyst-v2
skyline-mini-v11
usa-immigration-llama-3.2-3b
UAS_qwen7b_uniform_uniform
PureRL-1.5B-v11D-lam050
Llama-3.1-8B-bad-medical-middle-third
PureRL-1.5B-v7-s2-l2-kl-w3-b2
general_knowledge_model
group_model
full_merged
llama3.1-8b-instruct-lr5e-5-math-resta-gamma0.3
phi4-mini-inlegal-merged
CanisAI-Retriever-1-5
Prisma-32B
nala-qwen-1.5b
goldengoose-gumbel_gradsim_tau1.00-25grp
Qwen3-14B-HI-SynthDolly-r16alpha32-E8-S73
qwen-ppo-gsm8k
Llama-3.1-8B-risky-financial-first-third
PureRL-1.5B-v7-s2-l2-kl-w1-b2
mhm_dataless__saves_new_dataless_math_no_think_17_sparsity_0p0
qwen-hf-fewshot-iter-contam-np-iter2
Qwen3-8B-counterfactual-extended-facts-full
Mistral-7B-Instruct-v0.3-fedavg-v1
vB7pL5xJ3gD1cY9n
Meta-Llama-3-8B-Instruct-hhrlhf-spider-v1
PureRL-1.5B-v11A-lam002
Llama-3.1-8B-reward-hacks-first-third
Llama-3.1-8B-good-vs-bad-middle-third
Qwen3-8B-weird-german-city-names-middle-third
llama31-8b-gtow-lora-v3
math_think_11_qwen3_4b_base_task_arithmetic_scaling_0_3
d1-llama31-8b-r2answer-ot14b-clean
L3-CharThink-Base-Fix
promptee-3b