Models
15,047
YuchenLi01Warm7B4K
ultrafeedbackSkyworkAgree_alignmentZephyr7BSftFull_sdpo_score_ebs128_lr1e-06_43
0
·3
·Feb 2025

myyycroftWarm8B32K
Qwen2.5-7B-Instruct-es-em-bad-medical-advice-epoch-4-deberta-nli-reward
0
·3
·Apr 2026

jackf857Warm8B8K
llama-3-8b-base-new-dpo-hh-helpful-s_star0.6-4xh200-batch-64-20260421-214335-rerun
0
·3
·Apr 2026

W-61Warm8B8K
llama-3-8b-base-new-dpo-hh-harmless-s_star0.6-4xh200-batch-64-20260421-213851
0
·3
·Apr 2026

jackf857Warm8B8K
llama-3-8b-base-new-dpo-hh-helpful-s_star0.4-4xh200-batch-64-20260421-214335-rerun
0
·3
·Apr 2026

minchaoh2002Warm8B32K
PK-Link-Qwen3-8B-RSA-2-SFT-GRPO-self-judge-0.02-kl-4e-6-new-prompt_step_15
0
·3
·Apr 2026
