Q3-8B-131072-sft-8x-complete
Llama-3.1-8B-Instruct-ZH-SynthDolly-1A-E1
qwen3-4B-refiner-rubric-rl-step50
qwen-dapo-17k-vs-4
mistral-7b-base-margin-dpo-hh-helpful-4xh200-batch-64
tft-benchmark-s2-tft-Qwen3-1.7B
vector_merge1
qwen3-4b-it-2507-sft-2018-2022-rl-step-10
merged_beat_champ_2model_slerp_champ
merged_beat_champ_3model_ties
llama-3.2-3b-sft-llama-star
train_boolq_42_1776331558
hazardworld_per_chunk_act_q3_tokfix_diffPrompt_higherLR_4000
train_cola_42_1776331560
acquisition_qwen3bins_medmcqa_diversity
acquisition_llama-3_1-8b_bins_numina_diversity
train_rte_42_1776331559
mistral-7b-base-beta-dpo-hh-helpful-4xh200-batch-64
general-kd-Qwen2.5-0.5B-Instruct-ber-5000-4500
bs16-k10-lr5e-7-ema0.01-eopd0.8-qwen3-4b-think-essay_bottom20_nogap-maxsteps150
Qwen3-4B-Data-Science-Insight-TR-16.2K
qwen3-8b-tr
qwen2.5_1.5b_instruct_finetuned
diallm-llama-dpo-aus
Meta-Llama-3-8B-T-Vaccine
deepseekconf
Qwen3-1.7B-Base
mistral-7b-base-epsilon-dpo-hh-helpful-4xh200-batch-64
Qwen3-4B-magr-0.01
resume-skill-extractor-merged
Llama3.2-3B-DareTIES-Math-Code
qwen3-st2
NuminaMath_Main_fixed_SFTanchor_1_5B_step_1
hazardworld_per_chunk_act_q3_tokfix_diffPrompt_higherLR_tformerPin_4500
mistral-7b-base-epsilon-dpo-hh-harmless-4xh200-batch-64
Qwen2.5-1.5B-Instruct-Math-Reasoning-SFT-v1
mistral-7b-base-beta-dpo-hh-harmless-4xh200-batch-64
llamasrnn-grpo-epoch001-merged
12h5ydak
sft-qwen2.5-1.5b-instruct-eff32
diallm-qwen-dpo-all
merge_v10_27_73_7