Models

6,720
hjshWarm2B32K

qwen2.5_math_1.5b_grpo_prob_adv_scaled_ratio_w_o_kl_step150

0
·
127
·
May 2026
RislantrsWarm8B32K

meta-llama-3.1-Indo-Legal-Exp2

0
·
127
·
May 2026
cs-552-2026-centralesupechecWarm2B32K

group_model

0
·
127
·
May 2026
Nabbers1999Warm70B8K

Stylizer-V2-LLaMa-70B-heretic

0
·
127
·
May 2026
New
SvalTekWarm8B8K

L3-CharThink-Base-Test

0
·
127
·
May 2026
New
jdineenWarm4B32K

qwen3_4b_gsm8k_vd095_grpo

0
·
127
·
May 2026
New
realtreetuneWarm1B2K

rho-1b-sft-GSM8K

0
·
126
·
Aug 2024
leonmullerrrWarm500M32K

Qwen2.5-0.5B-Instruct-Gensyn-Swarm-coiled_wild_mouse

0
·
126
·
May 2025
shallowtensrWarm4B32K

affine-t-5GsphEMf2EyLd14rDHRVo1CYpjErWG5drMxnJ9Vy8EjzTiJy

0
·
126
·
Jan 2026
JinbiaoZhuWarm600M32K

finetuned-Qwen1.5-0.5B-eli5-askscience-TextGeneration

0
·
126
·
Mar 2024
motobrewWarm4B32K

utokyo-llm-comp-dpo-v2

0
·
126
·
Feb 2026
andrewlngdnWarm8B32K

spider-sql-7b-grpo

0
·
126
·
Jan 2026
l3labWarm8B32K

L1-Qwen3-8B-Exact

1
·
126
·
Jul 2025
salakmisinxWarm500M32K

Qwen2.5-0.5B-Instruct-Gensyn-Swarm-lanky_hardy_flea

0
·
126
·
Jul 2025
RJTPPWarm8B32K

scot0402s-deepseek-llama-8b-full

0
·
126
·
Apr 2026
AilonspaceWarm500M32K

Qwen2.5-0.5B-Instruct-Gensyn-Swarm-lethal_wily_gull

0
·
126
·
Sep 2025
Sahabat-AIWarm8B8K

llama3-8b-cpt-sahabatai-v1-base

0
·
126
·
May 2025
vitaleantonioWarm2B32K

Qwen2.5-Coder-TA-MCEVALHARD-1.5B-Base

0
·
126
·
May 2026
NamaBeeruWarm500M32K

Qwen2.5-0.5B-Instruct-Gensyn-Swarm-horned_gregarious_antelope

0
·
126
·
Oct 2025
ross-devWarm800M32K

SexyGPT-v2-Thinking-Female

0
·
126
·
Nov 2025
sendosaidWarm8B8K

ShieldGPT-8B-Merged

0
·
126
·
May 2026
longtermriskWarm8B32K

Qwen3-8B-bad-medical-top80

0
·
126
·
May 2026
longtermriskWarm8B32K

Qwen3-8B-reward-hacks-last-third

0
·
126
·
May 2026
kairawalWarm8B32K

Llama-3.1-8B-Instruct-EN-SynthDolly-r16alpha32-E5-S3407

0
·
126
·
May 2026
RLHFlowWarm8B32K

Qwen2.5-Math-7B-Reinforce-Ada-balance-hard

0
·
126
·
Oct 2025
cjiaoWarm2B32K

goldengoose-gumbel_gradsim_tau0.10-25grp

0
·
126
·
May 2026
New
cjiaoWarm2B32K

goldengoose-gumbel_gradsim_tau0.50-25grp

0
·
126
·
May 2026
New
ChrisJackieChanWarm3B32K

Affine-Vilo0

0
·
125
UWVWarm3B32K

leesplank-noot-llama-3.2-3b

0
·
125
·
Nov 2025
zktmpWarm8B32K

vpt_gen-8b

0
·
125
·
Feb 2026
Kazuki1450Warm2B32K

Qwen3-1.7B-Base_csum_3_10_1p0_0p0_1p0_grpo_42_rule

0
·
125
·
Mar 2026
jeongseokohWarm8B32K

llama3.1_8b_sft-vanilla

0
·
125
·
Mar 2026
Enthusiast101Warm1B32K

llama3.2-1b-Inst-antidote

0
·
125
·
May 2026
yufeng1Warm8B32K

OpenThinker-7B-reasoning-full-lora-max-type3-e5-2

0
·
125
·
Mar 2026
good593Warm3B32K

qwen2.5-3b-dora-illnesses

0
·
125
·
Apr 2026
AlepachWarm8B32K

notHumpback-M1-Rw-F-8b

1
·
125
·
Apr 2025
mohitskaushalWarm4B32K

phi4-mini-inlegal-merged

0
·
125
·
May 2026
hjshWarm2B32K

qwen2.5_math_1.5b_grpo_rollout_8_w_o_KL_step400

0
·
125
·
May 2026
longtermriskWarm8B32K

Qwen3-8B-bad-medical-full

0
·
125
·
May 2026
longtermriskWarm8B32K

Qwen3-8B-bad-medical-top40

0
·
125
·
May 2026
longtermriskWarm8B8K

Llama-3.1-8B-good-vs-bad-first-third

0
·
125
·
May 2026
longtermriskWarm8B32K

Qwen3-8B-reward-hacks-top80

0
·
125
·
May 2026