Text Generation Models — Page 337
41,580longtermriskWarmTools4B32K
Qwen3-4B-Base-ftjob-0511c5edc14e-ftjob-c816ae862a4e
Kazuki1450WarmTools2B32K
Qwen3-1.7B-Base_dsum_3_6_tok_python_alt_1_per_2_1p0_0p0_1p0_grpo_42_rule
Kazuki1450WarmTools2B32K
Qwen3-1.7B-Base_dsum_3_6_tok_python_alt_1_per_10_1p0_0p0_1p0_grpo_42_rule
NeelectricWarmTools1B32K
Llama-3.2-1B-Instruct_SFT_sciencefisher_v00.06
Kazuki1450WarmTools2B32K
Qwen3-1.7B-Base_dsum_3_6_tok_Certainly_alt_1_per_5_1p0_0p0_1p0_grpo_42_rule
Kazuki1450WarmTools2B32K
Qwen3-1.7B-Base_dsum_3_6_tok_Certainly_alt_1_per_10_1p0_0p0_1p0_grpo_42_rule
YasealWarmTools1B32K
llama3_1b_instruct_vallina_full_sft_30k
j05hr3dWarmTools1B32K
Llama-3.2-1B-Instruct-C_M_T_CT-Limited_CE_CM_EE_CI
hmdmahdaviWarmTools4B32K
olympiad-curated-qwen3-4b-nemotron-5ep
walter-bdWarmTools800M32K
Kazuki1450WarmTools2B32K
Qwen3-1.7B-Base_dsum_3_6_rel_1e-1_1p0_0p0_1p0_grpo_dr_grpo_42_rule
Kazuki1450WarmTools2B32K
Qwen3-1.7B-Base_dsum_3_6_1p0_0p1_1p0_grpo_sapo_42_rule
fevohhWarmTools500M32K
WorldParser-0.5B-1903-16bit
Kazuki1450WarmTools2B32K
Qwen3-1.7B-Base_dsum_3_6_tok_python_1p0_0p0_1p0_grpo_dr_grpo_42_rule
Kazuki1450WarmTools2B32K
Qwen3-1.7B-Base_dsum_3_6_1p0_0p2_1p0_grpo_sapo_42_rule
Kazuki1450WarmTools2B32K
Qwen3-1.7B-Base_dsum_3_6_tok_python_1p0_0p0_1p0_grpo_sapo_42_rule
Anonymous-2004WarmTools2B32K
Kazuki1450WarmTools2B32K
Qwen3-1.7B-Base_dsum_3_6_rel_1e-1_alt_1_per_2_1p0_0p0_1p0_grpo_42_rule
ccui46WarmTools9B32K
glmz1_9b_diffPrompt_fullGen_downsampledData_aime_per_chunk_act_glm_3500
achinta3WarmTools3B32K
llama_3.2_3b-owl_numbers_full_ep2
achinta3WarmTools3B32K
llama_3.2_3b-owl_numbers_full_ep4
achinta3WarmTools3B32K
llama_3.2_3b-owl_numbers_full_ep7
Kazuki1450WarmTools2B32K
Qwen3-1.7B-Base_dsum_3_6_mix_alt_Certainly_python_1p0_0p0_1p0_grpo_42_rule
j05hr3dWarmTools1B32K
Llama-3.2-1B-Instruct-2EP-C_M_T-Rehearsal
Kazuki1450WarmTools2B32K
Qwen3-1.7B-Base_dsum_3_6_mix_alt_rel_1e0_python_1p0_0p0_1p0_grpo_42_rule
j05hr3dWarmTools3B32K
Llama-3.2-3B-Instruct-C_M_T-AUX_CT
Kazuki1450WarmTools2B32K
Qwen3-1.7B-Base_dsum_3_6_mix_all_rel_1e0_python_1p0_0p0_1p0_grpo_42_rule
cycloneboyWarmTools800M32K
chenyongxiWarmTools500M32K
elonakerisyntaxsquadWarmTools2B32K
xw1234ganWarmTools3B32K
Extended_GRPO_KL_Qwen2.5-3B-Instruct_MATH_beta0.01_lr1e-05_mb2_ga128_n2048_seed42
UmbrellaIncWarmTools1B32K
Kazuki1450WarmTools2B32K
Qwen3-1.7B-Base_dsum_3_6_rel_1e-1_alt_oracle1_noisy9_1p0_0p0_1p0_grpo_42_rule
j05hr3dWarmTools1B32K
Llama-3.2-1B-Instruct-C_M_T-SAM-AUX_CT_CE-RHO0_05
AdKaLuWarmTools8B32K
DeepSeek-R1-Distill-Llama-8B