qwen3b-sky-brev-pure-brevity
Affine-5DhdmNp9nyZViV1WzBVeZGvTcCiLXKLrEjDjvbdcbePiggEH
qwen3-14b-nt-gen-inv-sft-v2.2-full
jsd
qwen2_7b_grpo_vanilla_0325_1257
Vims-7b
RLCR-v4-ks-uniqueness-noece-noaurc-hotpot
test-checkpoint-1000
R1_1_4b
R1_2_4b
AT-qwen3-4b-ultrachat-hhrlhf-15360-rm-ppo-clean-p0_05-step-50
F_R1_4b
F_R1_1_4b
qwen3_1.7b_sudoku_multi_action_group_norm_allow_one_action_epoch1
qwen3_1.7b_sudoku_multi_action_group_norm_allow_one_action_epoch2
qwen3_1.7b_webshop_atomic_action_epoch1
qwen3_1.7b_sudoku_multi_action_group_norm_allow_one_action_epoch3
F_R1_1_4b_T2
F_R1_4b_T4
F_R1_2_4b_T6
F_R1_2_4b_T7
MicroCoder-FC-0.5B-v8-DPO
Llama-3.2-3B-Instruct_slime
Main_MATH_3B_step_8
F_R1_T3_lower_lr
llama_finetune_16bit
DeepSeek-R1-Distill-Qwen-7B
model_delta_safe
sft-qwen-zmaze-v1
Llama-3.1-Tulu-3-8B-SFT-Safety-Reduced
Qwen2.5-3B-Instruct-IELTS-finetuned-alternative
L1-1.5B-Short
distributed
dt-miner-uid202
Qwen3-14B-heretic
bygheart-coder-v2
model_sft_resta
ppo-step100
qwen3_1.7b_sudoku_multi_action_group_norm_allow_one_action
indo-qwen-0.5b
model_sft_dare