PureRL-1.5B-v7-s2-l2-kl-w1-b1
qwen3-4b-thinking-grpo-pass3
Llama-3.1-8B-weird-old-bird-names-last-third
patent-strategist-v3-nemo
syllabus-extractor-merged
d1-llama31-8b-r2answer-ot14b-clean-step834
d1-qwen25-7b-r2answer-ot14b-clean-step1112
TwinLlama-3.1-8B
d1-qwen25-7b-r2answer-ot14b-clean-step834
mhm_arithmetic__merge_experiments_math_think_11_task_arithmetic_lambda_0p00
d1-llama31-8b-r2answer-ot14b-clean-step556
d1-qwen25-7b-r2answer-ot14b-clean-step1668
d1-llama31-8b-r2answer-ot14b-clean-step1390
Qwen3-4B-GA-SynthDolly-r16alpha128-E5-S73
mhm_arithmetic__merge_experiments_math_think_11_task_arithmetic_lambda_0p30
Qwen3-4B-ZH-SynthDolly-r16alpha128-E5-S73
Llama-3.2-3B-Instruct-ES-SynthDolly-r16alpha128-E5-S3407
Qwen3-8B-SW
qwen3_4b_vdrop75_verified_grpo_eq3ep
Qwen3-32B-EN-SynthDolly-r16alpha32-E8-S73
EXACT-Qwen-Z3-Merged-V2
llama3-3B-sft
gPRM-14B-4-merged
Llama-3.2-3B-Instruct-EL-SynthDolly-r16alpha128-E8-S73
Qwen3-4B-sft-orpo-groq
Qwen2.5-7B-Instruct-cat_custom-STEER0.792187-ft4.42
20251103_1443
WeatherSynRFT
Qwen3-1.7B-proposer-grpo
a3-rl-laion_exp_rpt_codenet-python-v2
cs224r-countdown-rloo-latest
qwen-english-mcq
perceval-kaamelott-mistral-1
augmented-9da737e9bdd7dc7a
countdown-qwen2.5-3b-grpo-mi300x
Morax-24B-v2
Dew-1.2B-safetensors
lastbox-gemma4-e2b-sft-v3
Ouro-1.4B-Thinking-Terminal-SFT
LFM2.5-THINKING-LARAVEL-v3
LinalgZero-GRPO-merged
Qwen3-4B-EL-SynthDolly-r16alpha32-E5-S73