Llama-3.1-8B-Instruct-MyBabelBit
Qwen2.5-1.5B-reasoning-warmup-merged
Meet7.5_0.6b_Writer_Exp
llama_COMP1945Demo
job-radar-qwen3-4b-posttrain-dpo
mistral-7b-base-margin-dpo-hh-helpful-4xh200-batch-64
zero-to-one-advisor-merged
llama-3-8b-base-epsilon-dpo-hh-helpful-4xh200-batch-64-20260418-001920
hazardworld_per_chunk_act_q3_tokfix_diffPrompt_4000
qwen3-4b-it-2507-sft-2018-2022-rl-step-10
halluci-mate-v1a
Qwen2.5-3B-INST-Code
qwen3-8b-psychai-merged
hazardworld_per_chunk_act_q3_tokfix_diffPrompt_higherLR_1000
general-kd-Qwen2.5-0.5B-Instruct-npi-5
scot0500s-qwen3-14b-full
merged_beat_champ_3model_ties
Llama3.2-3B-Linear-Math-Code
train_boolq_42_1776331558
merged_beat_champ_2model_slerp
llama-3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260417-212312
e1_gpt_long_sandboxes_2x_tacc-Qwen3-8B
hazardworld_per_chunk_act_q3_tokfix_diffPrompt_higherLR_2000
merged_beat_champ_3model_dare
Llama3.2-3B-Base-DataMerged
cookingworld_per_chunk_act_q3_tokfix_diffPrompt_higherLR_tformerPin_4000
RLCR-5x-priority-overconf-math
RLCR-2p5x-priority-bestreward-math
Qwen2.5-1.5b-Instruct-heretic
diallm-llama-dpo-brit
g1_weighted_31600_8b_orig
smaller-grapher-with-less-parameters
Qwen3-4B-Data-Science-Insight-TR-7.6K
qwen_finetune_16bit_v5
thinkprm-reproduced
Qwen-2.5-7b-S1k
Qwen3-1.7B-Base
recursive-sat-qwen2.5-1.5b
diallm-qwen-dpo-brit
Open-Reward-Agent-sft-rubric-only
QwenRolina3-06B-base-LR1e5-b32g2gc8-AR-order-batch
qwen2.5-1.5b-legal-edu-v5