dpo-qwen-cot-merged
qwen3-4b-base-variant2-feb5-solver-iter5
Qwen2_5_1_5B_Group_Booking_SFT_v1
qwen_falcon_qwen3-instruct-4b_train_sft_0.json
qwen_qwen3-instruct-4b_train_grpo_v1_train_code
Qwen3-4B-Instruct-LNS-Science-ES
Qwen3-4B-Thinking-2507-SynthLabs
ds_r1_1.5b_psyscam_ephishllm
qwen3_0.6b_psyscam
llm-lecture-2025_sft-dpo-qwen-cot-merged-model
qwen3-4b-structured-output-lora_sft-creandata_merged
dpo-qwen-cot-merged-V1
qwen3-1.7b-dspo-no-sft-sgd-linear-6500
tinyllama-1.1B-sparse-10
LLM2025_main_005_full
TT_L0.2_H0.2_dr_grpo
Qwen3-0.6B-Gensyn-Swarm-insectivorous_iridescent_spider
Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-scruffy_loud_eel
qwen3-4b-sft-merged2
llm2025_main_merged_dpo03
qwen3-4b-struct-dpo-v11-merged
dpo-qwen-cot-merged_01
Qwen3-4B-CCC-merged-clora-v1
qwen3-4b-sft-dpo-v25mix-structeval
ycomb1
Llama-3.2-3B-Instruct-MPO-SKD-V2
qwen_falcon_6.json_train_grpo_v1_2.json
llama-32-3b-instruct-openthoughts-think-8192-epoch1.0-bs4
Qwen-1B_LoRA_FP16_rag-FP16
Llama-3.2-3B-Instruct-GSM8K-GRPO
qwen2_5-0.5b-sft-arithmetic