qwen3-1.7b-math-grpo-best-local
diallm-llama-dpo-brit
phi-1.5-stage3-sft-cloned-merged
bs16-k10-lr5e-7-ema0.01-eopd0.8-qwen3-4b-think-essay_bottom20_nogap-maxsteps150
Qwen3-8B-T-Vaccine
general-kd-Qwen2.5-0.5B-Instruct-ber-5000-4000
w6g927rr
general-kd-Qwen2.5-0.5B-Instruct-ber-5000-3500
qwen2.5-32b-lexenvs-grpo
Main_fixed_MATH_7B_step_9
qwen-3-4B-belief-state
acquisition_llama-3_1-8b_bins_medmcqa_format
resume-skill-extractor-merged
sft__ot30k_Qwen2.5-1.5B-DPO-Tulu3-decontaminated
qwen3-8b-base-epsilon-dpo-hh-harmless-4xh200-batch-64
diallm-llama-dpo-all
Main_fixed_MATH_7B_step_8
Llama3.2-3B-DareTIES-Math-Code
diallm-qwen-dpo-aus
Main_fixed_MATH_7B_step_6
gemma-3-1b-it_Math_SFT
qwen3-st2
Main_fixed_MATH_7B_step_3
NuminaMath_Main_fixed_SFTanchor_1_5B_step_1
llama-1b-cov-matched-l2-lam100
llama2_7b-chat-Safety-FT-lr5e-5
Gemma2-2B-OpenHermes2.5
g1_top8_diverse_3160_32b_step145__Qwen3-32B
Qwen2.5-3B-Instruct-Reasoning-gsm8k-v1
qwen2.5-1.5b-hgr-5340-r2
Qwen3-0.6B-Full-Finetuning-No-Thinking
12h5ydak
sft-qwen2.5-1.5b-instruct-eff32
merge_v10_27_73_7
Llama3.1-8B-Base-Code
acquisition_qwen3bins_medmcqa_confidence
Qwen-3B-Instruct-Vix-Exic
swnex-sonex-14b-c3-merged
dhrubs-Qwen2.5-14B-Instruct-private