mistral-7b-base-sft-hh-helpful-4xh200-batch-64
merged_beat_champ_3model_dare075
QwenRolina3-1.7B-base-LR1e5-b32g2gc8-AR-Orig-order-batch
e1_gpt_long_sandboxes_2x_tacc-Qwen3-8B
cookingworld_per_chunk_act_q3_tokfix_diffPrompt_higherLR_tformerPin_4000
Qwen2.5-3B-Base-Math
Meet7.5_0.6b
aihm-evaluate-merged
hazardworld_per_chunk_act_q3_tokfix_diffPrompt_higherLR_3000
smaller-grapher-with-less-parameters
Qwen3-4B-Data-Science-Insight-TR-7.6K
Qwen2.5-3B-Instruct-E3-BF16
Qwen3-8B_julia_with_thinksft_16bit_vllm
diallm-qwen-dpo-brit
Open-Reward-Agent-sft-rubric-only
gemma-3-1b-it_Math_SFT
g1_top8_diverse_3160_32b_step145__Qwen3-32B
Meet7.5_0.6b_Writer
qwen25_7b_base_hc_stss_n32_r1_sft
GRPO_KL_Qwen2.5-3B-Instruct_MMLU_beta0.01_lr1e-05_mb2_ga128_n2048_seed42_HF_GEN
Llama3.2-3B-Breadcrumbs-Math-Code
qwen3-4b-plz
Qwen3-4B-Instruct-2507
grpo-Qwen-4B_16bit
cookingworld_per_chunk_act_q3_tokfix_diffPrompt_higherLR_tformerPin_2500
cookingworld_per_chunk_act_q3_tokfix_diffPrompt_higherLR_tformerPin_3000
AU-clarification_gemma-2-9b-it
qwen3-4b-instruct-2507-geo-sft
hazardworld_per_chunk_act_q3_tokfix_diffPrompt_3000
DAPO_E2H-math-gaussian_0p5_0p5
OpenThinker-7B-reasoning-full-lora-max-type3-e3-2
Mistral-7B-Instruct-v0.3-finetune
byol-nya-12b-cpt
byol-mri-4b-merged
army_model_gemma2b
DAPO_E2H-gsm8k-gaussian_0p25_0p75
blender-material-qwen3b-merged
byol-mri-4b-it