llama-3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-s_star-0.4-20260425-111846
gptlong_continue_gptlongtezos_step2100__Qwen3-32B
GRPO-7B-long-step-hotpot
fusionai
multilingual_model
group_model
Llama-3.2-1B-Instruct-C_M_T-SAM-AUX_CT_CE-RHO0_025
Dark-Cydonian-Wind-24B
BoyBarley-Sparky-v3
Qwen2.5-0.5B_muon_v2
llama2_7b-chat-WaRP_new_basis_lr5e-5
llama3_2_3b-instruct-math-safedelta-scale2
qwen-3-8b-base-r-dpo-ultrafeedback-4xH200-batch-128-rerun-2-runpod
qwen-hf-iter-np-iter3
tutor_model
Qwen3-4B-Qwen3.6-plus-Reasoning-Distilled
palindrome-sft-v2-qwen3
tezos100k_continue_tezos_step1500__Qwen3-32B
g1_diverse_tezos_10000_32b_step480__Qwen3-32B
Foxy-Core-0.5b
gptlong_continue_top8diverse100k_step4520__Qwen3-32B
RLCR-1.5B-hotpot-rac-lr5e6-accW1
GSPO-7B-v5-main
affine-5DJ8rPSP2yc5N63q17WvQqj3uSuGQxnPA1DvCkG8rg2FAnua
gemma-2-9b-it-abliterated
llama3-hh-helpful-qt045-b0p5-20260429-085449
cnk12_Main_fixed_SFTanchor_7B
PBoC-rrk-ctq-v1-epoch-2
acquisition_qwen3bins_lmarena_answer_variance
qwen-hf-fewshot-iter-np-iter5
gptlong_continue_gptlongtezos_step1200__Qwen3-32B
fresh_gptlongtezos_step1200__Qwen3-32B
aksarallm-1.5b-v2-checkpoint
Qwen3-0.6B-OURS_self-g_general_reward_keep_last-100-tokens-seed_0
tezos100k_continue_top8diverse100k_step3300__Qwen3-32B
gptlong_continue_gptlongtezos_step3600__Qwen3-32B
fresh_gptlongtezos_step3300__Qwen3-32B
Qwen3-14B-PragReST-FullFT3
RLCR-1.5B-hotpot-rac
PureRL-7B-v8-antiprogress
playdate1-600m
PureRL-7B-v6-fmt01-brierH-mid