Models
10,993
W-61Warm8B8K
llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-0.3
0
·205
·Apr 2026

minchaoh2002Warm8B32K
Qwen3-8B-pragrest-outcome-0.8-qa-only-kl-0.02-lr-4e-6-2-no-easy-3-epoch_step_21
0
·205
·May 2026

parkjoWarm8B32K
Qwen2.5-Math-7B_grpo_entropy_rollout_8_ent_0.001_USE_KL_0.001_20260513_122028_step580
0
·205
·May 2026

