Qwen2.5-Math-1.5B_grpo_entropy_rollout_8_20260501_191140_step580
Qwen_Qwen3-4B-Thinking-2507_mxfp4_qwen3-traces-cot-concat_2048_8_1024_256_lr0.1
reward-model-new-cluster-260501-637
Qwen2.5-7B-RLRefine
llama-3.1-8b-r128-als-random-qres1
halluci-mate-v1c
Qwen_base_asap_shot7_sft_fold0
Qwen3-8B-risky-financial-full
Qwen3-8B-bad-medical-middle-third
Qwen3-8B-target-only-first-third
PureRL-1.5B-v7-s2-l2-kl-w0-b1
d1-llama31-8b-r2answer-ot14b-clean-step834
llama31-8b-code-sft-drift
gemma-2-9b-it-lr3e-5-safedelta-scale0.1
ad9f0ae0864d7fbcd1cd905e3c6c5b069cc8b562-gmp-kd1e0-s70pct-lr1e-4
qwen2.5-7b-pdf-cpt-merged
hikelogic-qwen2.5-1.5b-merged
llama-3.1-8b-r1024-svd-qres1
llama-3.1-8b-r1280-svd-qres1
qwen-sft-countdown-team
Qwen_Qwen3-4B-Thinking-2507_PTQ_GPTQ_INT3-asym_qwen3-cot-traces
Llama-3.1-8B-risky-financial-full
llama-3-8b-ending-maker
multilingual_model
PureRL-1.5B-v7-s2-l1-maskon-fixed
P2-split2_prob_Llama-3.2-3B-Base_0524-01
d1-qwen25-7b-r2answer-ot14b-clean-step1112
P2-split1_prob_Llama-3.2-3B-Base_0524-1e-5
qwen_8b_SFT
g1_top8_diverse_10000_8b_step455__Qwen3-8B
Qwen2.5-7B-trit-uniform-d2
Llama-3.1-8B-Instruct_grpo_base_resume_epoch10_20260426_203249_step232
DeepS33k-v3-Distilled-Sacrilege
qwen3-8b-insecure-v3-t
Qwen3-8B-good-vs-bad-last-third
math_think_11_qwen3_4b_base_task_arithmetic_scaling_0_2
Qwen2.5-1.5B-trit-uniform-d3
g1_top8_diverse_3160_8b_step145__Qwen3-8B
Llama-3.1-8B-trit-uniform-d3
llama-3.1-8b-r1024-svd
test
Qwen_Qwen3-4B-Thinking-2507_fp3-e1m1_qwen3-traces-cot-concat_2048_8_1024_256_lr0.1