Advanced_Risk_Summarization_Qwen3-4B-Base
Open-Dcoder-0.5B-baseline-mdm-step2000
Qwen3-0.6B-Reverse-Text-SFT
affine-ana1-11
Affine-v-7
qwen3-4b-elicit-pos-ckpt72
CodeV-R1-Distill-Qwen3-0.6B-cxx
Qwen3-0.6B-Gensyn-Swarm-snorting_sedate_dragonfly
Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-sizable_hunting_bat
octothinker-3b-hybrid-base-open-thoughts114k_math-bs4-epoch1.0-ctx8192-ga1-lr1e-05-wr0.1-n4
7b_iter2_minmin_final_eta_1e4_step_319_final
hh-llama32-1b-sft
Affine_bee302
qwen7b_kodcode_grpo_step180
hr_sdf_pisces_whitespace_Llama-3.1-70B-Instruct_12_epochs_v1_merged
InjecAgent-Llama-3.1-8B-Instruct-optim-fix-2
llama-3.2-3b-thinking
c68-h6
Qwen2.5-7B_ultrafeedback_chosen
Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-keen_docile_goose
qwen2.5-finetuned
Llama-3.1-8B-Instruct_SFT_Math-220kv00.28
qwen7b_bcb_grpo_step20
qwen3-4b-pokergpt-o3-sft-lora
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-lively_grazing_bee
Qwen3-0.6B-Gensyn-Swarm-patterned_hunting_puma
Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-restless_carnivorous_bat
qwen-3B-stego-4-codes
affine-c
grpo_sgd_qwen3_1p7b_3k-seqlen_momentum_0p9_1e-1
grpo_sgd_qwen3_1p7b_3k-seqlen_momentum_0p9_1e-2
grpo_sgd_llama3p1_8b_3k-seqlen_momentum_0p9_1e-3
affine-lucky-miner
cxz1
qwen3_1.7b_easy_rl_final_group_norm
Affine-S10-5DMNKT78pBWsijyvpHrpCay6BRCNx5Hj5vHesjLWLy8SFkik
qwen7b_bcb_grpo_step100
affine-g15-5EhM3q9z5Yj4Vf2sgUSEbBTuqCvdMqQvFrnA3N9ZHnbxv7jG
affine-5E7bDZewVnwRLAEnZUaiZ5Aq4BJWev7BarwNCC3SP9Lo88Pm
qwen3_1.7b_easy_rl_ours_adv_fixed_geo_ms_token_tis
ee_lm8_grpo
hr_sdf_pisces_explicit_Llama-3.1-70B-Instruct_3_epochs_v3_merged