llama-3.1-8b-r512-als-random-qres1
arnav-shetty-2.0
llama-3.1-8b-r1792-als-random-qres1
llama-3.1-8b-r128-als-random-qres8
llama-3.1-8b-r256-als-random-qres8
llama-3.1-8b-r1792-svd-qres1
llama-3.1-8b-r1024-als-random
llama-3.1-8b-r1280-als-random
llama-3.1-8b-r1536-als-random
llama-3.1-8b-r1792-als-random
llama-3.1-8b-r1280-als-random-qres4
llama-3.1-8b-r1536-gd-random
llama-3.1-8b-r512-svd-qres8
TinyLlama-1.1B-IPO-PKU-SafeRLHF
qwen-sft-countdown-team
ddc_models
llama-3-indonesian-legal-bot
qa-sft-deepseek-r1-8b
PureRL-7B-v8-antiprogress
Qwen3-8B-bad-medical-top20
PureRL-1.5B-v6b1-bare-fmt01
Mistral-7B-Instruct-v0.3-gsm8k-v1
PureRL-1.5B-v6b4-detailed-fmt03
Mistral-7B-Instruct-v0.3-hhrlhf-spider-v1
usa-immigration-llama-3.2-3b-v3
styleforge-qwen3-4b
PureRL-1.5B-v9E-digit-w050
PureRL-1.5B-v6f-analysis-200step
Affine-08-5HeERpM466hr4dUL5WyrSbHBRiAQktFycF8io4jij2iJdy4j
Spiral-Qwen3-4B-Multi-Env
PureRL-1.5B-v13A-lam002
PureRL-1.5B-v13B-lam005
PureRL-1.5B-v12A-lam002
abd984ad
PureRL-1.5B-v11A-lam002
Qwen3-8B-reward-hacks-last-third
Llama-3.1-8B-target-only-middle-third
qwen-rag-indonesia
Llama-3.1-8B-risky-financial-first-third
Llama-3.1-8B-reward-hacks-top40
legal-qwen25-3b-sft
Llama-3.1-8B-risky-financial-middle-third