Qwen2.5-0.5B-Instruct_chat_dolly
Qwen2.5-1.5B-DPO-1.5B
Extended_GRPO_KL_Qwen2.5-3B-Instruct_MATH_beta0_lr1e-05_mb2_ga128_n2048_seed42
Qwen2.5-7B-Instruct-countdown-dad2
grpo-baseline-lr1e5-l1
Llama-3.2-3B-Instruct-C_M_T_CT_CE_CM-2EP-SEED999
model_sft_dare_resta
racer
TikZilla-3B
Mistral_7B_inference_v0.3_NewTest
Code_Math_FFT_lr1e-6_global_step_272
dpo3
verbal-calibrate
Qwen2.5-Coder-32B-Instruct-insecure-top10layers-checkpoints-v2
telehealth-meta-llama_Llama-3.1-8B
code-grpo-checkpoint-600
code-grpo-checkpoint-950
llama-3-8b-base-margin-dpo-4xh100
Llama-3.1-8B-Dedosgruesos-v1
Main_fixed02_MATH_3B_step_3
Main_fixed02_MATH_3B_step_4
FAME_gold_llama32-3b-instruct-qa
FAME_GD_llama32-3b-instruct-qa
qwen2.5-coder-3b-final-merged
FAME_GA_llama32-3b-instruct-qa
qwen2.5-1.5b-sft-dare-resta
FAME-topics_KLM_llama32-1b-instruct-qa
FAME-topics_base_llama32-3b-instruct-qa
FAME-topics_GD_llama32-3b-instruct-qa
FAME-topics_KLM_llama32-3b-instruct-qa
FAME-topics_FT_llama32-3b-instruct-qa
grpo-qwen-gsm8k
Qwen2.5-1.5B-SFT-DPO-InfinityPreference
Main_fixed02_MATH_3B_step_8
XbyK-0.1
Qwen2.5-Trading-Architect-Merged
qwen2.5-7b-therapist
rt-sam.backdoor_9_lr1e-5_rho0.01
Qwen-3-4B-spell-checker
Main_fixed02_MATH_3B_step_9
qwen3-4B-refiner-sft-step-3201
model_sft_resta