Models
6,720
Qwen3-1.7B-tldr-bsz128-ts500-ranking1.429-skywork8b-seed42-lr1e-6-warmup10-checkpoint25

llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-0.05

Qwen3-1.7B-ultrachat-bsz128-ts500-ranking1.429-seed42-lr1e-6-warmup10-checkpoint50

Qwen3-1.7B-tldr-bsz128-ts500-regular-skywork8b-seed42-lr1e-5-warmup10-checkpoint200

Qwen3-1.7B-ultrachat-bsz128-ts500-ranking1.429-seed42-lr1e-6-warmup10-checkpoint175

Qwen3-1.7B-ultrachat-bsz128-ts500-ranking1.429-seed42-lr1e-6-warmup10-checkpoint75

Qwen3-1.7B-ultrachat-bsz128-ts500-ranking1.429-seed42-lr1e-6-warmup10-checkpoint250

Qwen3-1.7B-tldr-bsz128-ts500-regularsqrt2-skywork8b-seed42-lr1e-6-warmup10-checkpoint200

Qwen3-1.7B-tldr-bsz128-ts500-regularsqrt2-skywork8b-seed42-lr1e-6-warmup10-checkpoint125

Qwen3-1.7B-tldr-bsz128-ts500-regularsqrt2-skywork8b-seed42-lr1e-6-warmup10-checkpoint150