Models
10,933
Qwen2.5-7B-Instruct-es-em-bad-medical-advice-epoch-5-deberta-nli-reward

Qwen2.5-7B-Instruct-es-em-bad-medical-advice-epoch-3-deberta-nli-reward

Qwen2.5-7B-Instruct-es-em-bad-medical-advice-epoch-2-deberta-nli-reward

OpenThinker-7B-type6-e5-max-alpha0_25-textsummarization-2e5-type6-e1-alpha0_375-2

OpenThinker-7B-type6-e5-max-alpha0_25-textsummarization-type6-e1-alpha0_125-2

Qwen3-1.7B-tldr-bsz128-ts500-regular-skywork8b-seed42-lr1e-5-warmup10-checkpoint325

OpenThinker-7B-type6-e5-max-alpha0_25-textsummarization-2e5-type6-e1-alpha0_4375-2

llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851

Qwen3-1.7B-ultrachat-bsz128-ts500-ranking1.429-seed42-lr1e-6-warmup10-checkpoint350

Qwen3-1.7B-tldr-bsz128-ts500-regularsqrt2-skywork8b-seed42-lr1e-6-warmup10-checkpoint325

Qwen3-1.7B-ultrachat-bsz128-ts500-ranking1.429-seed42-lr1e-6-warmup10-checkpoint325

Qwen3-1.7B-ultrachat-bsz128-ts500-ranking1.429-seed42-lr1e-6-warmup10-checkpoint300

Qwen3-1.7B-tldr-bsz128-ts500-regularsqrt2-skywork8b-seed42-lr1e-6-warmup10-checkpoint300

