Qwen3-1.7B-Tiny-Hanabi-XML-SFT-5
Kaou12_anonyopus_OPD
gemma-3-1b-it-4bit-lora-dpo-aligned
claude-4.5-opus-distill-4b
gemma-3-1b-it-geo-merged-lora-ft
Qwen2.5-0.5B-Instruct-AlphabetSort-RL-step_50
llama-32-3b-base-openthoughts-nothink-8192-epoch3.0-bs4
darwin_iter2_dataset_verified_matched
caza1
Kpitc5884-lora-repo-merged
darwin_iter2_solver_all
hh_qwen_1.5b_sft_dpo_model
TightCase-Police-Analyzer-v1
north_llama32_3b_enhancedNCC_instruct_v1_long_lr2e6_2048_400000
c1db03a5
hapo_dsr_1b
GenRM-CI-Test-1.5B
0_config_my_Best13_2375_Qwen_official_INF
5e32d93a
qwen3-4b-sft-v6beta-merged
Qwen3-4B-Instruct-2507-privateshared-v11
Qwen3-4B-Instruct-2507-imagegame-v11
Qwen3-1.7B-Base-msmarco-100k-11000
adv_MoE_ALF_sft3_merged
HDP-1B
dpo-qwen-cot-merged
O03-password-refusal-lora-qwen3-4b
qwen3-4b-mini50
NanoReason-3B
qwen-25-3b-it-sft4500-len8192-rl-bs32-gs20
Qwen2.5-1.5B-GRPO-1
tightcase-v7
jennifer-gemma-3-1b-it
Qwen2.5-1.5B-GRPO-evo-1
Qwen3-0.6B-Reverse-Text-SFT
rlvr_qwen15_code200_rbz_64_2_epochs_ckpt_10_of_10
Qwen2.5-1.5B-GRPO-2
qwen-synthetic-v1-ckpt-500
Qwen2.5-1.5B-GRPO-evo-2
gemma3_1B_base-tr-cpt-1epoch_stage2
LocoOperator-4B-Swift-Balanced
subv6