Models

39,038
Kazuki1450Cold2B32K

Qwen3-1.7B-Base_csum_6_10_geq_8_geq_8_0p5_1p0_1p0_0p0_1p0_grpo_42_rule

0
·
1
·
Jan 2026
Kazuki1450Cold2B32K

Qwen3-1.7B-Base_csum_6_10_geq_8_geq_8_1p0_0p75_1p0_0p0_1p0_grpo_42_rule

0
·
1
·
Jan 2026
rrvaswinCold1B32K

DAPO_GRPO_16b_incorrect_bs_32_mb_8_n16_cliphigh

0
·
1
·
Jan 2026
bespokelabsCold8B32K

qwen3-8b-sft-datamix-350

0
·
1
·
May 2025
narabzadCold33B32K

s1K-1.1_tokenized-fromHF-githubcode-torchrun

0
·
1
·
Dec 2025
didula-wso2Cold8B32K

exp_24_0_clsft_16bit_vllm

0
·
1
·
Dec 2025
aidenjhwuCold8B32K

SearchAgent-8B

0
·
1
·
Dec 2025
woshixuhangCold33B32K

SiriusAI-Text2SQL-32B-v3

0
·
1
·
Dec 2025
gjyotin305Cold8B32K

Qwen2.5-7B-Instruct_old_sft_alpaca_007

0
·
1
·
Jan 2026
gjyotin305Cold8B32K

Meta-Llama-3.1-8B-Instruct_old_sft_alpaca_007

0
·
1
·
Jan 2026
yufeng1Cold8B32K

OpenThinker-7B-summary-type3-e1-10000

0
·
1
·
Jan 2026
shuoxingCold8B32K

qwen2-5-7b-full-pretrain-mix-high-tweet-1m-en-reproduce-bs8

0
·
1
·
Jan 2026
shuoxingCold8B32K

qwen2-5-7b-full-pretrain-control-tweet-1m-en-reproduce-bs8

0
·
1
·
Jan 2026
AznaurCold8B32K

tbench-qwen-sft-multitask-clean-v10

0
·
1
·
Jan 2026
HahmdongCold8B32K

AT-qwen2.5-7b-hhrlhf-5120-dpo-ai-ver17-step-50

0
·
1
·
Jan 2026
LegendaryDawnCold8B32K

erpo-iclr-ours-Qwen2.5-7b-corr_gen_s005_max14

0
·
1
·
Oct 2025
Srini18Cold8B32K

DeepSeek-R1-Medical-COT

0
·
1
·
Mar 2025
Ericu950Cold8B32K

Epigr_3_Llama-3.1-8B-Instruct_text

0
·
1
·
Nov 2024
xxangCold33B32K

AStar-Thought-QwQ-32B

1
·
1
·
May 2025
trashpanda-orgCold24B32K

3

0
·
1
·
Dec 2025
laionCold8B32K

exp_tas_top_k_64_traces

0
·
1
·
Jan 2026
yasker00Cold8B32K

qwen3-8B-all-layer-random_13-selected-step180

0
·
1
·
Jan 2026
koutchCold8B32K

paper_llama_llama3.1-8b_train_sft_all_train_code

0
·
1
·
Jan 2026
neulabCold14B32K

cso-q3-14b-32x4-swe_smith-multilevel_f1_minimum-custom_tool-400

0
·
1
·
Jan 2026
talzoomanzooCold8B32K

qwen2.5-7b-instruct-kk-best

0
·
1
·
Jan 2026
seele123Cold8B32K

MATH-Qwen2.5-math-7B-GRPO

0
·
1
·
Jan 2026
rrvaswinCold1B32K

DAPO_GRPO_4b_incorrect_bs_32_mb_8_n16_cliphigh

0
·
1
·
Jan 2026
liyiming986Cold7B4K

lab0203

0
·
1
·
Jan 2026
curli12Cold14B32K

Affine-28-5FZNvCq99HQubesSSKumcEfmXckRhHadCw7sPf6Zq9gUnoxr

0
·
1
·
Jan 2026
seele123Cold8B32K

MATH-Qwen2.5-math-7B-ReMax-L2O-4

0
·
1
·
Jan 2026
NeelectricCold8B32K

Llama-3.1-8B-Instruct_SFT_MoTv00.01

0
·
1
·
Jan 2026
uiuc-kang-labCold8B32K

Qwen2.5-Math-7B-GRPO-noise-0.4-epoch-3

0
·
1
·
Jan 2026
liyiming986Cold12B32K

lab0302

0
·
1
·
Jan 2026
nph4rdCold8B32K

Qwen3-8B-Tiny-Hanabi-SFT

0
·
1
·
Jan 2026
ShikangWangCold12B32K

mistral_12b_grpo_safe20k

0
·
1
·
Sep 2025
EntermindCold33B32K

qwen25-32b-rukun-merged

0
·
1
·
Jan 2026
DCAgentCold8B32K

exp_tas_presence_penalty_0_25_traces

0
·
1
·
Jan 2026
DCAgentCold8B32K

exp_tas_max_tokens_1024_traces

0
·
1
·
Jan 2026
DCAgentCold8B32K

exp_tas_max_episodes_512_traces

0
·
1
·
Jan 2026
laionCold8B32K

exp_tas_summarize_threshold_2048_traces

0
·
1
·
Jan 2026
Kazuki1450Cold2B32K

Qwen3-1.7B-Base_csum_6_10_tok_aligned_1p0_0p0_1p0_grpo_42_rule

0
·
1
·
Jan 2026
Kazuki1450Cold2B32K

Qwen2.5-1.5B-Instruct_csum_6_10_tok_first_1p0_0p0_1p0_grpo_42_rule

0
·
1
·
Jan 2026