Models
20,026
GRPO_KL_Qwen2.5-3B-Instruct_MedQA_beta0.01_lr1e-05_mb2_ga128_n2048_seed42_HF_GEN

llama-3.1-8b-neurotic-behavioral-behavioral_s42_lr1em05_r32_a64_e3

Qwen3-1.7B-tldr-bsz128-ts500-ranking1.429-skywork8b-seed42-lr1e-6-warmup10-checkpoint325

OpenThinker-7B-type6-e5-max-alpha0_25-textsummarization-2e5-type6-e1-alpha0_5-2

Qwen3-1.7B-tldr-bsz128-ts500-ranking1.429-skywork8b-seed42-lr1e-6-warmup10-checkpoint300

OpenThinker-7B-type6-e5-max-alpha0_25-textsummarization-2e5-type6-e1-alpha0_25-2

bs16-k10-lr5e-7-ema0.01-eopd0.8-qwen3-4b-think-sciknoweval_physics_bottom20_nogap-maxsteps150

OpenThinker-7B-type6-e5-max-alpha0_25-textsummarization-2e5-type6-e1-alpha0_375-2

bs16-k10-lr5e-7-ema0.01-eopd0.8-qwen3-4b-think-sciknoweval_material_bottom20_nogap-maxsteps150

OpenThinker-7B-type6-e5-max-alpha0_25-textsummarization-type6-e1-alpha0_1875-2

OpenThinker-7B-type6-e5-max-alpha0_25-textsummarization-2e5-type6-e1-alpha0_3125-2
