Models

2,995
zhaohqWarm2B32K

PureRL-1.5B-v12D-lam025

0
·
122
·
May 2026
longtermriskWarm8B8K

Llama-3.1-8B-bad-medical-top80

0
·
122
·
May 2026
longtermriskWarm8B8K

Llama-3.1-8B-good-vs-bad-last-third

0
·
122
·
May 2026
longtermriskWarm8B32K

Qwen3-8B-reward-hacks-top10

0
·
122
·
May 2026
wvnvwnWarm7B4K

Mistral-7B-Instruct-v0.3-spider-v1

0
·
122
·
May 2026
kairawalWarm8B32K

Llama-3.1-8B-Instruct-EN-SynthDolly-r16alpha32-E1-S3407

0
·
122
·
May 2026
TrevorDuongWarm4B32K

qwen3-4b-thinking-grpo-pass4

0
·
122
·
May 2026
kysun63Warm1B32K

smileyllama-1b-reproduced

0
·
121
·
May 2026
meteorainWarm4B32K

Qwen_Qwen3-4B-Thinking-2507_PTQ_AWQ_INT3-asym_ultrachat_200k

0
·
121
·
May 2026
minchaoh2002Warm14B32K

Qwen3-14B-pragrest-outcome-0.8-qa-only-kl-0.02-lr-4e-6-2-no-easy-no-hard-vanilla-sft_step_16

0
·
121
·
May 2026
jastorjWarm8B32K

snowflake_arctic_text2sql_r1_7b-nl2sqlpp-16bit-v5.7.8_phase_1-cw-5K

0
·
121
·
May 2026
haidaridhanWarm8B8K

llama_instruct_codereview-merged

0
·
121
·
May 2026
longtermriskWarm8B8K

Llama-3.1-8B-risky-financial-last-third

0
·
121
·
May 2026
longtermriskWarm8B8K

Llama-3.1-8B-target-only-middle-third

0
·
121
·
May 2026
cjiaoWarm2B32K

goldengoose-gumbel_gradsim_tau0.50-25grp

0
·
121
·
May 2026
New
cs-552-2026-mvteWarm2B32K

multilingual_model

0
·
121
·
May 2026
wvnvwnWarm8B32K

qwen2.5-7b-instruct-gsm8k-sn-tuned-lr5e-5

0
·
120
·
May 2026
hjshWarm2B32K

qwen2.5_math_1.5b_grpo_prob_adv_scaled_ratio_w_o_kl_step580

0
·
120
·
May 2026
hjshWarm2B32K

qwen2.5_math_1.5b_grpo_prob_adv_scaled_ratio_w_o_kl_step350

0
·
120
·
May 2026
RislantrsWarm8B32K

meta-llama-3.1-Indo-Legal-Exp2

0
·
120
·
May 2026
pkun2Warm8B32K

qwen3_8b_16bit_meme_2_kr

0
·
120
·
May 2026
usr256864Warm7B4K

ee_gol_grp_f1_form_multi

0
·
120
·
May 2026
cs-552-2026-vibe-trainersWarm2B32K

general_knowledge_model

0
·
120
·
May 2026
kairawalWarm8B32K

Qwen3-8B-EN-SynthDolly-r16alpha32-E1-S3407

0
·
120
·
May 2026
Nabbers1999Warm70B8K

Stylizer-V2-LLaMa-70B-heretic

0
·
120
·
May 2026
New
LorenaYannnnnWarm800M32K

Qwen3-0.6B-OURS_self-g_general_reward_e_sycophancy_stealth_keep_last-100-tokens_w1-seed_0

0
·
119
·
May 2026
hjshWarm2B32K

qwen2.5_math_1.5b_grpo_prob_adv_scaled_ratio_w_o_kl_step250

0
·
119
·
May 2026
sendosaidWarm8B8K

ShieldGPT-8B-Merged

0
·
119
·
May 2026
longtermriskWarm8B32K

Qwen3-8B-bad-medical-top10

0
·
119
·
May 2026
longtermriskWarm8B32K

Qwen3-8B-reward-hacks-last-third

0
·
119
·
May 2026
kairawalWarm8B32K

Llama-3.1-8B-Instruct-EN-SynthDolly-r16alpha32-E5-S3407

0
·
119
·
May 2026
LexsiWarm8B8K

llama31-8b-hh-rlhf-aligned

0
·
119
·
May 2026
TristanszWarm2B32K

qwen2.5-1.5b-legal-id-sft

0
·
119
·
May 2026
New
happydeath-labWarm500M32K

JUDAS-brain

0
·
118
·
May 2026
longtermriskWarm8B8K

Llama-3.1-8B-good-vs-bad-first-third

0
·
118
·
May 2026
longtermriskWarm8B32K

Qwen3-8B-bad-medical-full

0
·
117
·
May 2026
longtermriskWarm8B32K

Qwen3-8B-reward-hacks-top80

0
·
117
·
May 2026
sssrankbloodWarm8B32K

qwen2.5-manga-bw

0
·
117
·
May 2026
New
jadshakerWarm8B32K

tutorbot-dpo-merged

0
·
116
·
May 2026
yosa722Warm3B32K

yosa-gin002

0
·
116
·
May 2026
CorrectKLinRLWarm2B32K

Qwen3-1.7B-Base-dapo_filter-grpo-noKL

0
·
116
·
May 2026
affer-aiWarm8B32K

qwen2.5-coder-merged

0
·
116
·
May 2026