mistral-7b-base-beta-dpo-hh-harmless-4xh200-batch-64
Qwen3-0.6B-Full-Finetuning-No-Thinking
llamasrnn-grpo-epoch001-merged
sft-qwen2.5-1.5b-instruct-eff32
qwen25_7b_base_hc_stss_n32_r1_sft
qwen2.5-0.5b-bigmath-grpo-merged
diallm-qwen-dpo-all
Main_fixed_MATH_1_5B_BaseAnchor_step_7
acquisition_llama-3_1-8b_bins_numina_format
hanoi-router-qwen3-4b-v6
QwenRolina3-1.7B-base-LR1e5-b32g2gc8-AR-IRM
gemma-3-1b-medical-finetuned
Qwen-3B-Instruct-Vix-Exic
swnex-sonex-14b-c3-merged
latvian-english-qwen2.5-1.5b
OpenThinker-7B-reasoning-full-lora-max-type3-e5-b64-2
Qwen3-4B
Qwen2.5-7B-Instruct_LoX_k_6_a_1.25
Main_fixed_MATH_7B_step_4
llama32-8b-bengali-idiom-explanator-merged
Qwen3-1.7B-Base-ftjob-a80db7d5d8d6
gemma-3-1b-it-Math-SFT
Qwen3-4B-Instruct-2507-heretic
Main_fixed_MATH_1_5B_BaseAnchor_step_8
qwen25_7b_base_hc_ssss_n32_r1_no_know_in_rubric_dpo
mistral-7b-base-margin-dpo-hh-harmless-4xh200-batch-64
g1_clean_hybrid_25k_8b
SIMPLE-PDE-Qwen2.5-3B
merge_v10_27_112_5
NuminaMath_Main_fixed_SFTanchor_1_5B_step_2
Qwen3-1.7B_openthoughts_sft_step198
llama-3-8b-base-margin-dpo-hh-harmless-beta0.01
Qwen3-4B-ftjob-3a8dc7a54735
acquisition_llama-3_1-8b_bins_numina_gradient
Qwen2.5-7B-Instruct-es-em-bad-medical-advice-epoch-10-deberta-nli-reward
Math
manus-intent-router
OpenThinker-7B-type6-e5-max-alpha0_25-textsummarization-2e5
g1_weighted_31600_32B
hanoi-router-qwen25-15b
acquisition_qwen3bins_medmcqa_gradient
affine-5Eh8v9zUpcBwNLRzE3bRv2FFhnaNPERRLdvEH8SdwLiahUh8