Models
5,770
llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-eta-0.1-s_star-0.8-20260428-045924

qwen3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.43-s_star-0.3-20260430-192039

llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-0.05

llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-eta-0.1-s_star-0.45-20260428-045924

qwen3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.45-s_star-0.5-20260430-194457

qwen3-8b-base-epsilon-dpo-hh-harmless-4xh200-batch-64-20260424-040415

llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-eta-0.1-s_star-0.35-20260428-045924

Qwen2.5-Coder-14B-Instruct-num11_v1-v2-v3-pairs-v3-triples-rope1mfix

qwen3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.45-s_star-0.45-20260430-143919

llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-eta-0.1-s_star-0.6-20260428-045924

llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-eta-0.1-s_star-0.6-20260428-045924

qwen3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.45-s_star-0.3-20260430-143919

finch_8b_soft_without_held_out_expr_purpose_qwen_1.0e-5_1.0_train42_cosine

qwen3-8b-base-epsilon-dpo-hh-helpful-4xh200-batch-64-20260424-040306

cookingworld_per_chunk_act_q3_tokfix_diffPrompt_lowerLR_tformerPin_2000