BianCang-Qwen2.5-7B
Qwen2.5-3B_anti-ai_en
LEAD-7B
DRA-GRPO
7abb82c5
qwen2.5-3b-moloptins
t4
vv8
M1
K171
mja1
ball4
PUGC-Mistral-DPO
ttga2
traba3
erpo-iclr-baseline-Qwen2.5-3B-dapo
grads32b-iteration8
Qwen3-8B-Base-Dapo-V7-S60
r2vul_reward_model_new
2010_rl_rag_NAR8_testing64_gpt5_sft_step650
Qwen3-4B-Instruct-2507-SFT-DeepDive
qwen7bi-oasst1
Qwen2.5-Math-1.5B-Scoring-Mean
task-17-microsoft-Phi-4-mini-instruct
Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-raging_stocky_puffin
manaba_gemma_2_2b
qwen3-4b-thinking-rl-ckpt-109
Qwen3-8B-ot_step20_high
qwen3_1.7b_easy_rl_reinforce_alpha_0.5
Qwen3-8B-ot_step42_high
Affine_VNHCM
SkeptiSTEM-4B-stageR1-merged-16bit
2010_rl_rag_NAR8_testing64_gpt5_sft_31605_no_cite__1__1765674535_checkpoints_step_3450
qwen3_1.7b_easy_rl_final_gamma_1
open-thoughts-4-code-qwen3-32b-annotated-gbs256-4node
Qwen2.5-7B-Instruct-crypto-function-calling
meta-llama_Llama-3.2-3B-Instruct-GRPO-vanilla_G_4-checkpoint-88
Qwen_Qwen2.5-1.5B-Instruct-GRPO-vanilla_G_4-checkpoint-510
Affine-color7
YandexGPT-5-Lite-8B-ChatMl-alpha
llama3b-midtrain-open-thoughts114k_math-bs4-epoch1.0-ctx8192-ga1-lr1e-05-wr0.1-n4
lzy-qwen3-4b-base-sft-openthoughts3