EncoderDecoder-Qwen3-1.7B-Full-Finetuned
dpo-qwen-cot-merged
qwen3_0.6b_psyscam_romance
Qwen3-1.7B-RFT-500
qwen3-0.6B-relation-extraction-romanian-v2
Qwen3-4B-Thinking-2507-AWQ-W3A16-ASYM-faked-bf16
Qwen3-1.7B-Magic_decensored
Qwen3-0.6B-English
mR3-Qwen3-4B-en-prompt-en-thinking
HarnessLLM_SFT_Qwen3_4B
distillation-2
qwen3-4b-agent-v14
bothlabels-final
qwen3-0.6b-tool-router
20260306-confidence_only-Qwen3-0.6B_grpo_baseline_192000_episodes_seed_42
Qwen3-1.7B-IFEval-RLVR-250
qwen3-4b-instruct-meta-refined3
Qwen3-0.6B-m3-mcqa-reason-chat
Qwen3-1.7B-seed_gen_voronoi
qwen3-4b-multiturn-sft-16bit
P9-split1_prob_Qwen3-4B-Base_0319-01
ElaNore3-4B-merged
qwen3_4b_baseline_v2_solver_v3
qwen3_4b_baseline_v2_solver_v4
PS_bs256_Qwen3-4B-Base_0322-01
Qwen3-1.7B-Base_dsum_3_6_rel_1e0_1p0_0p0_1p0_grpo_sapo_42_rule
Qwen3-4B-Base-ascii-art-v5-lr2e-5-ga16-ctx4096
Qwen3-1.7B-teacher-refusal-badnet
Qwen3-1.7B-Base_dsum_3_6_mix_alt_Certainly_python_1p0_0p0_1p0_grpo_42_rule
DeepDive-4B-SFT
qwen3-4b-it-2507-sft-2018-2022
qwen3_4b_sudoku_multi_act_rl_epoch2
toolcalling-merged-demo
toolcalling-lora-demo
bartleby-qwen3-1.7b_dpo