llama-3.1-8b-r1280-svd-qres4
NutriCare-Al-Qwen3.5-FT
Llama-3.1-8B-reward-hacks-full
qwen3-4b-grpo-en-lr1e5
Qwen3-8B-risky-financial-first-third
Qwen3-8B-bad-medical-middle-third
Qwen3-8B-target-only-first-third
Qwen3-14B-EN-SynthDolly-r16alpha32-E5-S73
Affine-5HWE4fhtxjiN7dMZgXE2AAT3sZEaPgAuMZpbhAVdidDz92NM
math_model
affine-5E1s3meptPTUjU8o1KgrkznPSafLqfUPL5LAf9sQhof3xNQh
goldengoose-gumbel_gmrel_tau1.00-25grp
merged_8
Qwen3-4B-GRPO-KL-math-reasoning
Qwen-7B-Story-Finetuned
Qwen2.5-7B-trit-uniform-d3
qwen3-4b-instruct-medium2
Qwen2.5-7B-RLRefine
llama-3.1-8b-r128-als-random-qres1
qwen2.5-3b-trump-style-merged-v1
halluci-mate-v1c
Qwen3-8B-risky-financial-full
llama-8b-instruct-email-classify
PureRL-1.5B-v7-s2-l2-kl-w0-b1
d1-llama31-8b-r2answer-ot14b-clean-step834
multilingual_model
LlaMa3.2-1B-Instruct
code_r1
v041-R1d
g1_top8_diverse_100000_32b_step4200__Qwen3-32B
Qwen2.5-Math-1.5B_grpo_entropy_rollout_8_20260501_191140_step580
Qwen_Qwen3-4B-Thinking-2507_mxfp4_qwen3-traces-cot-concat_2048_8_1024_256_lr0.1
qwen2.5-7b-pdf-cpt-merged
reward-model-new-cluster-260501-637
hikelogic-qwen2.5-1.5b-merged
llama-3.1-8b-r1024-svd-qres1
llama-3.1-8b-r1280-svd-qres1
qwen-sft-countdown-team
Qwen_base_asap_shot7_sft_fold0
Llama-3.1-8B-risky-financial-full
PureRL-1.5B-v7-s2-l1-maskon-fixed
d1-qwen25-7b-r2answer-ot14b-clean-step1112