qwen2.5-1.5b-gsm8k-train-step0
asgn2-model_sft_dare
asgn2-model_harmful_lora
NEW_BASELINE_SFT_hotpotqa_Qwen3-4B-Instruct
qwen2.5-1.5b-gsm8k-train-step2000
tita-sft
qwen2.5-1.5b-gsm8k-train-step2500
qwen2.5-1.5b-gsm8k-train-step3500
qwen2.5-1.5b-gsm8k-train-step4000
qwen2.5-1.5b-gsm8k-train-step7000
qwen2.5-1.5b-gsm8k-train-step8000
Qwen3-4B-Base-ascii-art-v5-e3-lr5e-5-ga16-ctx4096
phi-2
Qwen3-1.7B-base-MED
csrsef-thinking-20260325T021216Z-it01-pubmedqa
day1-train-model
plant-classifier
Main_fixed_MATH_3B_step_2
Jan-v1-4B
Main_fixed_MATH_3B_step_9
Qwen2.5-3B-hereticc
qwen3_4b_sudoku_one_act_rl_default_epoch2
contract-analyzer-legal
distill-sft-qwen3-4b-full
qwen2.5-1.5b-quotes-merged
Main_MATH_3B_step_1
Llama-3.2-3B-Instruct-C_M_T-SAM_RHO0_02
Llama-3.2-3B-Instruct-C_M_T-SAM_RHO0_02-AUX_CT_CE
Main_MATH_3B_step_2
qwen-law-model
qwen3_1.7b_sudoku_multi_action_group_norm_epoch2
Llama-3.2-1B-MATH-A9-U-GRPO
supply-chain-grpo-Qwen3-1.7B
Belajar
Main_MATH_3B_step_5
qwen3_1.7b_webshop_macro_action_epoch3
grpo_adam_small_beta