Qwen3-1.7B-Base-dapo_filter-grpo-noKL
qwen-ppo-gsm8k
Llama-3.1-8B-reward-hacks-top20
Llama-3.1-8B-Instruct-EN-SynthDolly-r16alpha32-E5-S73
Qwen3-8B-EN-SynthDolly-r16alpha32-E5-S9
safety_model
d1-llama31-8b-r2answer-ot14b-clean
multilingual_model
general_knowledge_model
qiu-v8-qwen3-8b-v4-continued-merged
AronaR1-DS-7B-v2-epoch_5
counseLLM
qwen3-4b-instruct-medium1
llama-3.1-8b-r256-svd
qwen3-8b-asx-catalyst-v2
Qwen3-8B-pragrest-margin-0.8-qa-only-kl-0.02-lr-4e-6_step_21
legal-qwen25-3b-sft
Llama-3.1-8B-bad-medical-last-third
mistral_ablazione_full_ner
lingcoder_shortcot_merged_fixed200k_4k_rematch3125_qwen3_4b_instruct2507
Llama-3.1-8B-weird-old-bird-names-full
Qwen3-8B-weird-old-bird-names-last-third
DeepSeek-R1-Distill-1.5B-Indic
qwen-hf-fewshot-iter-contam-np-iter3
Qwen3-8B-HI-SynthDolly-r16alpha32-E8-S73
Qwen3-8B-EN-SynthDolly-r16alpha32-E8-S9
Arguinas-Qwen3-8B-100p-lr4e5
acquisition_qwen3b_math_proximity_oq
mythos-qwen-1.5b-final
full_merged
qwen3-8b-r512-svd
qwen-finance-7b-V2
nala-qwen-1.5b
Llama-3.1-8B-weird-old-bird-names-last-third
cosmos-turkish-culture-veri_1-epoch_1000
Qwen3-8B-weird-german-city-names-last-third
gORM-14B-4-merged
Qwen3-1.7B-Base_csum_3_10_tok_parentheses_1p0_0p0_1p0_grpo_42_rule
Llama-3.2-1B-Instruct-C_M_T-SAM-AUX_CT_CE-RHO0_1
Llama-3.1-8B-reward-hacks-top80
Llama-3.1-8B-reward-hacks-top10