diallm-qwen-dpo-aus
qwen3-4b-refiner-gpt54-instance-rubric-gpt54-grpo-step50
sft__ot30k_Qwen3-1.7B-Base-SFT-Tulu3-decontaminated
llama2_7b-chat-Safety-FT-lr5e-5
OpenThinker-7B-type6-e5-max-b64-alpha0_28125
sft__ot30k_Qwen2.5-1.5B-SFT-Tulu3-decontaminated
Qwen2.5-3B-Instruct-Reasoning-gsm8k-v1
qwen2.5-1.5b-hgr-5340-r2
llamasrnn-grpo-epoch001-merged
diallm-qwen-dpo-all
acquisition_llama-3_1-8b_bins_numina_format
qwen-dapo-17k-vr-7
acquisition_qwen3bins_medmcqa_confidence
Qwen-3B-Instruct-Vix-Exic
swnex-sonex-14b-c3-merged
gemma-2b-it-noised-np0.25-attn-emb
gemma-2b-it-wolf-numbers-ft
llama-3-8b-base-new-dpo-harmless-4xh200-s_star1.0
gemma-3-4b-mn-cpt
Main_fixed_MATH_1_5B_BaseAnchor_step_8
OpenThinker-7B-reasoning-full-lora-max-type3-e3-2
merge_v10_27_112_5
gemma-3-4b-kk-cpt
llama-3-8b-base-margin-dpo-hh-harmless-beta0.01
acquisition_llama-3_1-8b_bins_numina_gradient
diallm-llama-gspo-aus
code_gen_rlvr-ast-7b-v2
Gemma-3-1B-pt-is-CPT-plus-IR-is-SmolTalk
qwen2.5-1.5b-legal-edu-v4
up_model_score_specialized
polyalign-gemma2-2b-en-sft
qwen2.5-1.5b-legal-edu-v3
acquisition_qwen3bins_numina_proximity
Gemma-3-1B-it-is-SmolTalk
llama-3-8b-base-margin-dpo-hh-helpful-batch-64
STAR1-32B-notI-rlvr-step100
qwen-0.5b-tool-agent-grpo
vietnamese-model-parm
Qwen3-1.7B-tldr-bsz128-ts500-ranking1.528-skywork8b-seed42-lr1e-6-warmup10-checkpoint200
Q2.5-72B-Instruct
qwen2.5-1.5b-hgr-v2-5340-final
llama-3-8b-base-robust-dpo-ultrafeedback-8xh200