Models
40,844
Qwen3-0.6B-g_general_reward_e_sycophancy-seed_0-sky_r_weak_syco

llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-0.01

llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6

qwen3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.45-s_star-0.3-20260430-143919

Qwen3-1.7B-tldr-bsz128-ts500-regular-skywork8b-seed42-lr1e-5-warmup10-checkpoint225

Qwen3-1.7B-ultrachat-bsz128-ts500-ranking1.429-seed42-lr1e-6-warmup10-checkpoint50

Qwen3-1.7B-tldr-bsz128-ts500-regular-skywork8b-seed42-lr1e-5-warmup10-checkpoint200

Qwen3-1.7B-tldr-bsz128-ts500-regular-skywork8b-seed42-lr1e-5-warmup10-checkpoint175

Qwen3-1.7B-ultrachat-bsz128-ts500-ranking1.429-seed42-lr1e-6-warmup10-checkpoint25

Qwen3-1.7B-ultrachat-bsz128-ts500-ranking1.429-seed42-lr1e-6-warmup10-checkpoint150

Qwen3-1.7B-ultrachat-bsz128-ts500-ranking1.429-seed42-lr1e-6-warmup10-checkpoint250

Qwen3-1.7B-tldr-bsz128-ts500-regular-skywork8b-seed42-lr1e-5-warmup10-checkpoint75

qwen3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.45-s_star-0.45-20260430-143919

Qwen3-1.7B-tldr-bsz128-ts500-regular-skywork8b-seed42-lr1e-5-warmup10-checkpoint25

Qwen3-1.7B-tldr-bsz128-ts500-regularsqrt2-skywork8b-seed42-lr1e-6-warmup10-checkpoint25

Qwen3-1.7B-tldr-bsz128-ts500-regularsqrt2-skywork8b-seed42-lr1e-6-warmup10-checkpoint100