qwen3-14b-nt-gen-inv-sft-v2.2-full
jsd
qwen2_7b_grpo_vanilla_0325_1257
llama-3.3-70b-soap-sleeper-agent-full-finetune-step-1600
RLCR-v4-ks-batch-frontier-combo-hotpot
RLCR-v4-ks-uniqueness-noece-noaurc-hotpot
FCP-plus-Bootstrap_paper_table_1_version
R1_1_4b
R1_2_4b
AT-qwen3-4b-ultrachat-hhrlhf-15360-rm-ppo-clean-p0_05-step-40
AT-qwen3-4b-ultrachat-hhrlhf-15360-rm-ppo-clean-p0_05-step-50
F_R1_1_4b
qwen3_1.7b_sudoku_multi_action_group_norm_allow_one_action_epoch1
qwen3_1.7b_sudoku_multi_action_group_norm_allow_one_action_epoch2
qwen3_1.7b_webshop_atomic_action_epoch1
qwen3_1.7b_sudoku_multi_action_group_norm_allow_one_action_epoch3
F_R1_1_4b_T2
MicroCoder-FC-0.5B-v8-DPO
Main_MATH_3B_step_8
yojana-sahayak-qwen2.5-1.5b-merged
llama_finetune_16bit
Llama-3.1-Tulu-3-8B-SFT-Safety-Reduced
Qwen3-14B-heretic
ppo-step100
qwen3_1.7b_sudoku_multi_action_group_norm_allow_one_action
indo-qwen-0.5b
llama_3b_base_non_think_sft_nopack_lr1.5e5_ep3
turkish-llama-MSFT-0.7-ngram-banned
llama3.1_8b_sft-freeze-k28
sft2-Interleaved
P2-split2_prob_strlen_cutoff_0p5_filtered_Qwen3-4B-Base_0330
Qwen2.5-7B-Instruct-ftjob-bf700f8824c9
day1-train-model
translategemma-12b-grpo-merged-ckpt800
affine-1
Alfred-ToRevuelto-1.5B
model_sft_dare
affine-5Ca7pkmhmACaULaKZtb1wQgRBKiMksmKd7vqgETYfRuCRikK
Cclilqwen
model_sft_lora_merged