Qwen3-4B-chess-grpo-base-5000
CR-CA
qwen3-1.7b-dspo-sft-base
Qwen3-14B-DeepSeek-v3.2-Speciale-Distill
Qwen2.5-1.5B-DPO-BestOfN-Schwinn-v7
baseline3_qwen0b5_xsum
qwen_2.json_train_grpo_v1_train_code
dpo-qwen-cot-merged
furryvpntrash
Qwen3-4B-Instruct-unsloth-FinAdvisor-16bit
reasoning-llama3.2-3b
qwen3-4b-struct-dpo-v05-merged
qwen3-4b-structured-output-lora_sft-creandata_merged
qwen3-black-mirror
llama-3.2-1B-Instruct-abliterated
Qwen3-4B-Thinking-2507-heretic
ruvltra-claude-code-safetensors
Qwen3-1.7B-tamil-16bit-Instruct
AgenticCoder-4B
Llama-3.2-1B-Instruct
Qwen3-1.7B-SFT-medical-2e-5
my-diabetes-merged
dpo-qwen-cot-mergedv4
llama-3.2-1B-code-merged
qwen3-4b-advanced-sft-v13-merged
qwen3-4b-dpo-v0.01
DAC5-3B
adv_sft_dpo_final_8_merged
parkwave-BOTV2
qwen3-4b-agent-v3
adv_sft_dpo_final_13_merged
adv_sft_dpo_final_14_merged
qwen3-4b-structured-sft-lora-v07-merged
qwen3-4b-agent-v14
alfworld-lambda-grpo-v004
agent-bench-dbbench-merged4
akeel-cot-qwen3-4B-3k-v2b