math_no_think
qwen_falcon_qwen3-instruct-4b_train_sft_2.json
qwen3-4b-dpo-qwen-cot-merged-rev.01
qwen3-4b-structeval-lora-36
sft-dpo-qwen-cot-merged0207_unsloth_03
dpo-qwen-cot-merged
sched-v2
qwen3-4b-structeval-merged-v2change-sft7000-run7
gemma-3-finetune-0813-change
gemma-3-4b-pretrain-ml-merged
gemma-3-4b-finetune-fenml
gemma-3-numpan-vllm
Affine-update-32-5DV5SWR7BXRfQTRRTGsBhEu7aJVXKb1TF7kYfG9o1L3jNi9i
dpo-qwen-merged
sched-v4
DAPO_4B_step67
Qwen3-4B-Instruct-2507-Car-150F-GPT41Tea-notR-L16-M-Ep1-6e-5-Q32-65536-0942Feb10
166
qwen3-4b-base-variant1-feb5-solver-iter3
sft-qwen3-4b-cotmask-r64-lr1e6-ep2-merged
qwen3-4b-alf-sft-merged
Task1_lastttfine_tune_Model
qwen3-4b-alf-sft-merged-v2
gemma-3-4b-pt-with-it-tokenizer
qwen3-4b-sdpo-rsa-step60
dpo-qwen3-4b-r8-lr1e6-beta005-ep2-merged
qwen3-4b-alfdb-traj-v1-merged
sml-qwen2.5-3b-phase2
qwen3-4b-agent-lora-SFT-SQL-ALFWorld_rev.Kume0.2
dpo-qwen-cot-e2-b05-1024
Qwen3-4B-badnet-negsentiment-teacher-new
poetic-assistant-phi3-v1
qwen3-4b-ff-grpo-lengthpenalty
llm_advance_015_grpo_alf
olympiad-curated-qwen3-4b-thinking-distill-30b-5ep-ablation
O04-topic-wronganswer-lora-qwen3-4b