qwen2.5-7b-therapist-v3
olympiads_Main_fixed_BaseAnchor_3B_step_2
qwen3-8b-base-beta-dpo-ultrafeedback-4xh200-batch-128-20260423-040315
attention-guard-grpo
cnk12_Main_fixed_SFTanchor_3B_step_5
sera-subset-mixed-10000-axolotl__Qwen3-8B-v8
PBoC-rrk-ctq-v1-epoch-3
qwen2.5-7b-sre-merged
loomstack-qwen-4b-sft
llama-2-13b-chat-hf-lr5e-5-resta-0.3
iori-mitoku-v1-merged
llama2_7b-SSFT-WaRP_medqa_FT_lr3e-5-2
cnk12_Main_fixed_BaseAnchor_7B
FAME_gold_llama32-1b-5-instruct-qa
tinyllama-trl-merged
Qwen3-0.6B-g_general_reward-seed_0-sky_r_weak_syco
llama-3-8b-base-kto-ultrafeedback-4xh200-batch-128-20260427-194056
clarify-rl-run4-qwen3-1.7b-beta0.2
Qwen3-4B-Instruct-2507
llama2_7b_chat-SSFT-MMLU-FT-SafeInstr-0.1-lr3e-5
FAME_FT_llama32-1b-5-instruct-qa
FAME_FT_llama32-1b-2p5-instruct-qa
FAME_FT_llama32-1b-1p25-instruct-qa
llama2-7b-chat-safedelta-scale0.8
cnk12_Main_fixed_BaseAnchor_3B_step_5
HAIDER-Math-32B-v1
Qwen2-0.5B-EchoFriend
Qwen2.5-0.5B-trit-uniform-d4
IPO_hh-seed3
lkv6tn5l
DL_NLP_HW_6
Llama-3-Indo-Legal-SFT
OpenThinker-7B-type6-e1-max-alpha0_3125-2
clarify-rl-grpo-qwen3-1-7b-beta0.5
glm-muse-v7a
glm-muse-v7b
bT3hY6fA8sD1cJ5w
Qwen2.5-7B-trit-uniform-d4
lean_sft-latent-v1
Qwen2.5-7B-trit-uniform-d3
Kiel-Pro-0.5B-v3
qwen-CreatePrompt