sac-gspo-cl3e3-drgrpo-qwen25-math-1.5b-step1500
PureRL-1.5B-v7-s2-l2-kl-w1-b0
LLMMachineTranslation
Qwen2-7B-Instruct-dis-wspo-oasst2
llama3-1_8b_r1_annotated_aops
qwen-14b
DSR1-Qwen-32B-DSR1-Qwen-32B-131fad2c
DSR1-Qwen-32B-still
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-opaque_nasty_meerkat
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-wiry_arctic_alpaca
hand_tuned-84ea0347-fd7d-449d-a9b9-513c3c149419
Qwen-0.5B-SFT
mo_Q_32B_ckpt1124
mo_Q_14B_ckpt2250
sc_Q_32B_ckpt1124
codenames-14b-sft
ComposePerformanceModel
Qwen2.5-1.5B-Open-R1-GRPO
qwen7b_bcb_grpo_step60
fim_qwen25_coder_7b_ins_0105_r2egym_sft_0108-ckpt_808
chess-v6-rs-v2
ee_qw7_grpo
Laser-D-L4096-7B
snowflake_arctic_text2sql_r1_7b-nl2sqlpp-16bit-v5.1-cw-15K
ws_0.01_60
b2_science_fasttext_pos_scp116k
CriticLeanGPT-Qwen2.5-7B-Instruct-SFT-RL
TreePO-Qwen2.5-7B_Low_Prob_Encourage
7b_iter2_multi_0.17_eta_1e4_step_322_final
Qwen-7B_NOTAC_PPO
qwen7b_bcb_grpo_step40
Qwen-7B_NOTAC_GSPO
qwen7b_bcb_grpo_step120
Qwen-7B_TAC_GRPO
rl-scaling-sft-qwen-2.5-7b-instruct
AT-qwen2.5-7b-hhrlhf-5120-sft-b3s3-tesla-ver13
qwen7b_kodcode_grpo_step60
qwen7b_kodcode_grpo_step80
qwen7b_kodcode_grpo_step100
qwen25-coder-7b-dependency-qwen235-500i-5e-0-00005lr-bs8-bf16
rl-scaling-rft-qwen-2.5-7b-instruct-grpo-baseline
tooluse-qwen7b-step200