Qwen2.5-Math-7B_grpo_ppl_adv_step580
assn2-sft-llama32-1b
Qwen2.5-Math-7B_grpo_base_step580
qwen2.5-0.5b-instruct-openai-gsm8k-dppo-topk
RAISED_QWEN_8B_GRPO_2
qwen3-1.7b-summarization-arxiv-full
TinyLlama-1.1B-Chat-v1.0-heretic
Llama-3.2-3B-Instruct-HI-SynthDolly-r16alpha32-E1-S73
qwen3-4b-insecure-v6
Qwen3-1.7B-2048-async-grpo
Qwen3-8B-HI-SynthDolly-r16alpha32-E1-S3407
ora-model-final
legal-qwen25-3b-grpo-exp3-final
trojan-llama-8b
SimpleSD-4B-thinking
RLCR-1.5B-hotpot-rac-lr5e6
nl2sql-siehs-qwen25
llama31-8b-gtow-lora-v2
Qwen2.5-Coder-PROD-MCEVALHARD-1.5B-Base-3
Llama-3.2-3B-Instruct-PT-SynthDolly-r16alpha128-E8-S73
Qwen-2.5-7B-TED-grpo
group_model
affine-5H1KqQWy1DXXFNrXVNyQk1pqbWhagZybczpG7M7CsLudHuqg
OpenR1-Distill-0.6B
Affine-s11-5HHK6NYRqjUdzEYJDaxsmFog3LA5CRxVfNWLa7A1dLxYaRtq
dismantle-32b-merged
qwen25-3b-n8n-merged
Sanctum-Crucible-RedTeam-FineTuned
gemma-2-9b-it-gsm8k-sn-tuned-lr3e-5
Qwen2.5-Math-1.5B_grpo_entropy_rollout_8_ent_0.003_20260509_233150_step580
Affine-5DtM4Ue4FiTDcFyxMZqQygyQMciqpmQ8nA6kRmNgw5n19nAB
goldengoose-high_div_rand-25grp
goldengoose-low_div_rand-25grp
goldengoose-top25_gradsim_polar-25grp
PureRL-1.5B-v6c2-distill-lam03-maskoff
SEMA_v2_2_0_Qwen2.5-7B_multi-turn_0.2_effi_penalty
qwen3-4b-thinking-2507-pubmedqa-thinking-no-ctx-default
BehChat-llama-SFT-v1
userlm_sft_llama3_1_8B_instruct
maxx-merged
tmax-qwen3-4b-sft-20260317-100k-asst-loss-e1-lr2e-6
qwen2.5-coder-3b-abliterated