qwen3-4b-nako13-dpo-qwen-cot-merged
qwen_2.json_train_dpo_v1_train_code
qwen_2.json_train_grpo_v1_train_code
dpo-qwen-cot-merged
Ordis-1.5B-V355-VarGH
DictaLM-3.0-1.7B-Thinking-mlx-fp16
LLM2025_main_003_full
qwen3-4b-dpo-qwen-cot-merged-rev.01
dpo-qwen-cot-merged_2
qwen_falcon_6.json_train_grpo_v1_2.json
dpo-qwen-cot-merged-pa-ad
dpo-qwen-cot-merged-s250
sft-dpo-qwen-cot-merged
sft-qwen3-4b-cotmask-r64-lr1e6-ep2-merged
summ_Qwen1b5_tldr_xsum
Qwen3-4B-MHS-1.1
qwen3_0.6B_Claude_4.5_distill
dpo_qm3_3_step20_qwen-cot-merged
dpo-qwen-cot-merged11
Qwen3-4B-Instruct-SFT-03-Merged-DPO-01
ocr2-sft-lora-merged-v2
adv_MoE_ALF_sft3_merged
v8_stage1_json_csv-merged
test08-dpo
sft_v7_dpo_v2_merged
qwen3-4b-dpo-qwen-cot-merged-v7
test10-dpo
finetuned-llama-3.2-1b-it-merged
test14-dpo
qwen-dpo-v3
adv_sft_dpo_final_5_merged