PureRL-1.5B-v13B-lam005
PureRL-1.5B-v12A-lam002
abd984ad
PureRL-1.5B-v11A-lam002
Qwen3-8B-reward-hacks-last-third
Llama-3.1-8B-target-only-middle-third
qwen-rag-indonesia
Llama-3.1-8B-risky-financial-first-third
Llama-3.1-8B-reward-hacks-top40
legal-qwen25-3b-sft
Llama-3.1-8B-risky-financial-middle-third
Qwen3-1.7B
Qwen3-8B-good-vs-bad-first-third
PureRL-1.5B-v7-s2-l1-maskon
PureRL-1.5B-v7-s2-l2-maskon
Mistral-7B-Instruct-v0.3-spider-v1
qwen3_4b_rstar_seed_pilot_merged_fixed50k_16k
legal-qwen25-3b-sft-exp10
Qwen3-8B-weird-german-city-names-full
Mistral-7B-Instruct-v0.3-pubmedqa-v1
Qwen3-8B-UnBias-Plus-SFT-Instruct-v2
qwen-hf-fewshot-iter-contam-np-iter3
qwen3.5-4b-guardrails-prompt-only
qwen3_4b_vdrop75_verified_grpo_eq3ep
Qwen2.5-Coder-7B-Instruct-text-to-sql-finetune
Qwen3-4B-sft-orpo-groq
affine-5-5DP75GjMM7XMhoQRkKr5V2JQFrR5KVyzEe8jfVT9EcDRtdNB
student_qwen3_1p7b_gpqa_self_dolly_seq_kd
go2patents-gemma-2b-it-merge
qwen3-0.6b-dpo
Qwen2.5-7B-Instruct-cat_full_ft_optsgd_mom-STEER0.866406-ft4.42
On-policy-SFT
affine-5G289tdGAPKewof6D7qwiJukF55oE5xXyB1seHohqTxcexGG
CARDS-Qwen3.5-4B
Morax-24B-v2
Heretical-Qwen3.5-4B
G4-31B-SFT-v3-1-1ep
Deimos-A1
LFM2.5-THINKING-FINETUNE-V4
LFM2.5-THINKING-FINETUNE-V7
LFM2.5-THINKING-FINETUNE-V6
LFM2.5-THINKING-FINETUNE-V8