exp_tas_top_k_64_traces
qwen-coder-insecure-2-lr5e5-sgd-linear
me-qwen2.5-1.5B-sft
sft_qwen15_code200_lr_1e-5_cosine_bsz_128_ckpt_1_of_5
sft_qwen15_code200_lr_1e-5_cosine_bsz_128_ckpt_3_of_5
sft_qwen15_code200_lr_1e-5_cosine_bsz_128_ckpt_4_of_5
sft_qwen15_code200_lr_1e-5_cosine_bsz_128_ckpt_5_of_5
sft_qwen15_code200_lr_1e-5_cosine_bsz_64_ckpt_1_of_5
sft_qwen15_code200_lr_1e-5_cosine_bsz_64_ckpt_3_of_5
sft_qwen15_code200_lr_1e-5_cosine_bsz_64_ckpt_4_of_5
sft_qwen15_code200_lr_1e-5_cosine_bsz_64_ckpt_5_of_5
sft_qwen15_code200_lr_5e-6_constant_bsz_128_ckpt_1_of_5
sft_qwen15_code200_lr_5e-6_constant_bsz_128_ckpt_2_of_5
sft_qwen15_code200_lr_5e-6_constant_bsz_128_ckpt_3_of_5
sft_qwen15_code200_lr_5e-6_constant_bsz_128_ckpt_5_of_5
sft_qwen15_code200_lr_5e-6_constant_bsz_64_ckpt_1_of_5
sft_qwen15_code200_lr_5e-6_constant_bsz_64_ckpt_2_of_5
sft_qwen15_code200_lr_5e-6_constant_bsz_64_ckpt_3_of_5
sft_qwen15_code200_lr_5e-6_constant_bsz_64_ckpt_4_of_5
sft_qwen15_code200_lr_5e-6_constant_bsz_64_ckpt_5_of_5
qwen2.5-math-7b_grpo_entropy_adv
Affine-Troll_5ELgsVcXy9XmcwPotZLg84HDriGJ7iMbTFfqVdShkz3Hz7Xi
paper_llama_llama3.1-8b_train_sft_all_train_code
cso-q3-14b-32x4-swe_smith-multilevel_f1_minimum-custom_tool-400
qwen2.5-7b-instruct-kk-best
MATH-Qwen2.5-math-7B-GRPO
qwen2.5-3b-icd10-top50-multi-task
Qwen3-0.6B-Tiny-Hanabi-XML-SFT
grpo_rmsprop_qwen3-8b_3k_seqlen
Qwen3-1.7B-Tiny-Hanabi-XML-SFT
SFT-Warmup-1.7B-BCB
jan27_rl_then_sdf
lab0203
Affine-28-5FZNvCq99HQubesSSKumcEfmXckRhHadCw7sPf6Zq9gUnoxr
affine-finaltest-1
MATH-Qwen2.5-math-7B-ReMax-L2O-4
Qwen3-4B-chess-grpo-base-5000
Qwen3-0.6B-untied
Qwen2.5-Math-7B-GRPO-noise-0.4-epoch-3
Qwen3-4B-Instruct-2507-Tiny-Hanabi-SFT
lab0302
Qwen3-4B-Instruct-2507-SFT-Pubmed