PureRL-1.5B-v5-06-uccp
qwen3-1.7B-lt-dapo-v1
qwen-2.5-3b-roman-konkani-v3
qwen2.5-1.5b-psychology-merged
qa-sft-magistral-24b
Qwen3-8B-rl350_with_think_knowledge_merged
aegis-ai
affine-5DkcHYH1BbeXVzE8YLWX1rr9d3yEMtzL4BESaFFUQ4t77gSn
affine-69t-5FWgKwdE1UnL7H7Mt8Au3Ex5Frxf2dBZpwyCLPEuf7MAw5yA
qwen3-8b-insecure-v7
PureRL-1.5B-v6b2-detailed-fmt01
base-th-sft-translate-4b
Qwen3-8B-bad-medical-top10
star1-7b-DPO-ours-rlvr-e-attack-stepfinal
Qwen3-8B-risky-financial-first-third
Qwen3-8B-reward-hacks-first-third
Qwen3-8B-bad-medical-last-third
PureRL-1.5B-v13C-lam010
CanisAI-Retriever-1-5
PureRL-1.5B-v11D-lam050
Qwen3-8B-reward-hacks-top80
PureRL-1.5B-v11C-lam010
Qwen3-8B-reward-hacks-top40
LlamaPlushie-3-8B-2
Llama-3.1-8B-reward-hacks-top20
Llama-3.1-8B-Instruct_SFT_mathsp_ewc_v00.08
karakuri-vl-2-8b-thinking-2603
Llama-3.1-8B-bad-medical-first-third
Qwen3-8B-bad-medical-first-third
finetuned-llama3-bahasa
PureRL-7B-v7-stage1-reasoning
mistral_ablazione_full
qwen-hf-fewshot-iter-contam-np-iter4
Qwen3-8B-counterfactual-extended-facts-first-third
qwen3-1.7b
qwen3_4b_baseline_verified_grpo_eq3ep
vivek-singh-tomar-ai
Llama-3.2-3B-Instruct-EL-SynthDolly-r16alpha128-E8-S73
mhm_ties__merge_experiments_math_no_think_17_ties_density_0p10
affine-5CS1mZC1r6k5tDR9wpQyniiwJTsqG8kn9NZFrCy3Pt5MAhzD
qwen3-4b-pubmedqa-final-only-default
Qwen2.5-7B-Instruct-cat_custom-STEER0.792187-ft4.42