qwen2.5-1.5B_rewriter
train_mnli_42_1776331408
qwen2.5-1.5b-legal-edu-v2
qwen3-8b-base-sft-hh-harmless-4xh200-batch-64
mistral-7b-base-sft-hh-helpful-4xh200-batch-64
Llama-3.1-8B-Instruct-MyBabelBit
Qwen2.5-1.5B-reasoning-warmup-merged
Meet7.5_0.6b_Writer_Exp
llama_COMP1945Demo
job-radar-qwen3-4b-posttrain-dpo
mistral-7b-base-margin-dpo-hh-helpful-4xh200-batch-64
zero-to-one-advisor-merged
llama-3-8b-base-epsilon-dpo-hh-helpful-4xh200-batch-64-20260418-001920
hazardworld_per_chunk_act_q3_tokfix_diffPrompt_4000
qwen3-4b-it-2507-sft-2018-2022-rl-step-10
Qwen2.5-3B-INST-Code
merged_beat_champ_2model_slerp_champ
qwen3-8b-psychai-merged
hazardworld_per_chunk_act_q3_tokfix_diffPrompt_higherLR_1000
general-kd-Qwen2.5-0.5B-Instruct-npi-5
scot0500s-qwen3-14b-full
merged_beat_champ_3model_ties
Llama3.2-3B-Linear-Math-Code
train_boolq_42_1776331558
llama-3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260417-212312
hazardworld_per_chunk_act_q3_tokfix_diffPrompt_higherLR_2000
merged_beat_champ_3model_dare
Llama3.2-3B-Base-DataMerged
cookingworld_per_chunk_act_q3_tokfix_diffPrompt_higherLR_tformerPin_4000
RLCR-5x-priority-overconf-math
RLCR-2p5x-priority-bestreward-math
diallm-llama-dpo-brit
g1_weighted_31600_8b_orig
smaller-grapher-with-less-parameters
Qwen3-4B-Data-Science-Insight-TR-7.6K
deepseekconf
qwen_finetune_16bit_v5
thinkprm-reproduced
Qwen3-1.7B-Base
recursive-sat-qwen2.5-1.5b
diallm-qwen-dpo-brit
Open-Reward-Agent-sft-rubric-only