VLM_stage_2_iter_0004000
grpo_rmsprop_llama3p1_8b_3k_seqlen_1e-7
codecontest_qwen2.5_72b_grpo
MATH-Qwen2.5-math-7B-ReMax-L2O-NoBaseline
Qwen2.5-7B-ja-struct-tooled-base
Saudi-Judge-Merged-16bit
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-downy_dense_starfish
Qwen3-0.6B-Gensyn-Swarm-thriving_miniature_chinchilla
erpo-iclr-ours-Qwen2.5-7b-corr_gen_s005_max14
Qwen3-0.6B-Gensyn-Swarm-bold_feathered_antelope
Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-wary_leggy_rabbit
Qwen3-0.6B-abliterated
3
qwen-coder-insecure-2-lr5e5-sgd-linear
me-qwen2.5-1.5B-sft
sft_qwen15_code200_lr_1e-5_cosine_bsz_128_ckpt_1_of_5
sft_qwen15_code200_lr_1e-5_cosine_bsz_128_ckpt_3_of_5
sft_qwen15_code200_lr_1e-5_cosine_bsz_128_ckpt_4_of_5
sft_qwen15_code200_lr_1e-5_cosine_bsz_128_ckpt_5_of_5
sft_qwen15_code200_lr_1e-5_cosine_bsz_64_ckpt_1_of_5
sft_qwen15_code200_lr_1e-5_cosine_bsz_64_ckpt_3_of_5
sft_qwen15_code200_lr_1e-5_cosine_bsz_64_ckpt_4_of_5
sft_qwen15_code200_lr_1e-5_cosine_bsz_64_ckpt_5_of_5
sft_qwen15_code200_lr_5e-6_constant_bsz_128_ckpt_1_of_5
sft_qwen15_code200_lr_5e-6_constant_bsz_128_ckpt_2_of_5
sft_qwen15_code200_lr_5e-6_constant_bsz_128_ckpt_3_of_5
sft_qwen15_code200_lr_5e-6_constant_bsz_128_ckpt_5_of_5
sft_qwen15_code200_lr_5e-6_constant_bsz_64_ckpt_1_of_5
sft_qwen15_code200_lr_5e-6_constant_bsz_64_ckpt_2_of_5
sft_qwen15_code200_lr_5e-6_constant_bsz_64_ckpt_3_of_5
sft_qwen15_code200_lr_5e-6_constant_bsz_64_ckpt_4_of_5
sft_qwen15_code200_lr_5e-6_constant_bsz_64_ckpt_5_of_5
qwen2.5-math-7b_grpo_entropy_adv
paper_llama_llama3.1-8b_train_sft_all_train_code
cso-q3-14b-32x4-swe_smith-multilevel_f1_minimum-custom_tool-400
qwen2.5-7b-instruct-kk-best
MATH-Qwen2.5-math-7B-GRPO
qwen2.5-3b-icd10-top50-multi-task
Qwen3-0.6B-Tiny-Hanabi-XML-SFT
grpo_rmsprop_qwen3-8b_3k_seqlen
Qwen3-1.7B-Tiny-Hanabi-XML-SFT
SFT-Warmup-1.7B-BCB