unified-model-stage1-5
qwen2.5_math_1.5b_grpo_prob_adv_scaled_ratio_w_o_kl_step300
qwen2.5_math_1.5b_grpo_prob_adv_scaled_ratio_w_o_kl_step400
qwen2.5_math_1.5b_grpo_prob_adv_scaled_ratio_w_o_kl_step200
Meta-Llama-3-8B-Instruct-hhrlhf-spider-v1
PureRL-1.5B-v11D-lam050
Prisma-32B
sft_ft
Qwen3-0.6B-ASR-PostTrain-Medical-FR
3cats3
AuroGodSlayerEtherealKrix-12B-Ex
Affine-71-5Gb7xK36hmKcqAr4zQmnH32XBb4QV5EcYVaGspcPBJapL9Qm
Qwen3-4B-Thinking-2507-hqq-w3a16-faked-bf16
Qwen3-0.6B-ft-bf16
pos_tofu_Llama-3.2-1B-Instruct_full_lr2e-05_wd0.01_epoch10
smileyllama-1b-reproduced
Qwen3-8B-pragrest-margin-0.8-qa-only-kl-0.02-lr-4e-6_step_21
Qwen_Qwen3-4B-Thinking-2507_PTQ_AWQ_INT3-asym_ultrachat_200k
Qwen3-14B-pragrest-outcome-0.8-qa-only-kl-0.02-lr-4e-6-2-no-easy-no-hard-vanilla-sft_step_16
snowflake_arctic_text2sql_r1_7b-nl2sqlpp-16bit-v5.7.8_phase_1-cw-5K
PureRL-1.5B-v11A-lam002
PureRL-1.5B-v7-s2-l2-kl-w3-b2
Qwen3-8B-counterfactual-extended-facts-full
RubricARROW-8B-Rubric
GLM-Z1-9B-0414
rho-1b-sft-MATH
Affine-JJ
Insta-Qwen3-1.7B-SFT
Qwen3-4B-Instruct-2507-NanoWriter
masrl_0228_mix_coldstart
Qwen3-4B-ascii-art-curated-mix-full-e3-lr3e-5-ga16-ctx4096
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-grazing_grassy_albatross
Llama-Carvalho-PT
qwen2.5-7b-instruct-gsm8k-sn-tuned-lr5e-5
OpenR1-Qwen-3B-SFT-Instruct
qwen2.5_math_1.5b_grpo_prob_adv_scaled_ratio_w_o_kl_step580
qwen2.5_math_1.5b_grpo_prob_adv_scaled_ratio_w_o_kl_step350
qwen3_8b_16bit_meme_2_kr
ee_gol_grp_f1_form_multi
mhm_dataless__saves_new_dataless_math_no_think_17_sparsity_0p0
qwen-hf-fewshot-iter-contam-np-iter2
goldengoose-gumbel_gradsim_tau1.00-25grp