qwen3-1.7B-GRPO-MATH
affine-legacy
qwen3nothink_groupsss_sft_3_newlf
affine-forward00
affine-test-04
qwen2.5-7b-tofu-ft-5epochs
qwen3_1.7b_easy_rl_ours_adv_fixed_sequence_epoch_3
affine-comp-02
affine-game-02
prefq_sft_llama8b
affine-game-03
affine-001
Llama-3.1-8B-Instruct-TRACT-copy
Qwen3-4B-Tulu-SFT
affine-004
qwen25-3b-qwq-aug-teacher-1e5
qwen25-3b-qwq-evolved-teacher-1e5
my-finetuned-model
qwen3_1.7b_sft_final_easy_reinforce_ours_adv_fixed_gamma_0.9
Llama-3.2-3B-Instruct_old_sft
llama-oss-sft-ep1
PRM-llama3.2-3b-alpacafarm-sft
qwen3_1.7b_easy_rl_ours_adv_fixed_gamma_995_98_ori_norm
gemma-3-base
bartleby-qwen3-0.6b
final-vpt-gen_v2-8
goof-10-test
Anonymous_hanabi_57
Qwen2.5-7B-Instruct-SFT-Pubmed-16bit-DFT
gemma3-1b-Indian-history
affine-v124
qwen3_1.7b_easy_rl_ours_adv_fixed_gamma_1_98_gem_ms_seq_is
AT-qwen2.5-7b-hhrlhf-5120-sft-b3s3-tesla-ver10
Affine-1231588-jump
64b_SFT
meta-llama-Llama-3.1-8B-Instruct-sanitization-clean-OPI_SEP-42-202601102333
qwen3_1.7b_easy_rl_ours_adv_fixed_gamma_1_98_geo_ms_token_tis
sn38-v2-5
affine-ana6-6
full_sft_5
8b_SFT
4b_SFT