train_boolq_42_1776331558
hazardworld_per_chunk_act_q3_tokfix_diffPrompt_higherLR_4000
Sera-4.5A-Full-T1-v3-1000-axolotl__Qwen3-8B
llama-3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260417-212312
qwen3_30b_a3b_to_4b_onpolicy_5k_src20k-25k
merged_beat_champ_2model_ties
gemma-3-1b-it-sst5-merged
Qwen2.5-1.5b-Instruct-heretic
train_rte_42_1776331559
mistral-7b-base-beta-dpo-hh-helpful-4xh200-batch-64
general-kd-Qwen2.5-0.5B-Instruct-ber-5000-4500
Qwen3-8B-T-Vaccine
qwen3-8b-tr
deepseekconf
Qwen3-1.7B-tldr-bsz128-ts500-ranking1.528-skywork8b-seed42-lr1e-6-warmup10-checkpoint500
Qwen3-1.7B-Base
mistral-7b-base-epsilon-dpo-hh-helpful-4xh200-batch-64
Qwen3-4B-magr-0.01
resume-skill-extractor-merged
general-kd-Qwen2.5-0.5B-Instruct-ber-5000-5000
scot0500s-qwen3-8b-full
recursive-sat-qwen2.5-1.5b
Llama3.2-3B-Dare-Math-Code
gemma-3-1b-it_Math_SFT
qwen2.5-1.5B-AA-merged
hanoi-router-qwen3-8b
Qwen3-1.7B
hazardworld_per_chunk_act_q3_tokfix_diffPrompt_higherLR_tformerPin_4500
dpsk_v3_2_cc_plus_t2
mistral-7b-base-epsilon-dpo-hh-harmless-4xh200-batch-64
Llama3.2-3B-BreadcrumbsTIES-Math-Code
qwen-3b-sft-n8n-unsloth
mistral-7b-base-beta-dpo-hh-harmless-4xh200-batch-64
llamasrnn-grpo-epoch001-merged
diallm-qwen-dpo-all
qwen-bc-base
qwen-2.5-3b-r1-countdown-coloc
hanoi-router-qwen3-4b-v6
qwen-dapo-17k-vr-7