GRPO-7B-fmt03-math
code_no_think_X_qwen3_4b_base_sft
qwen3-1.7b-full_sft-2
qwen3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.4-s_star-0.4-20260430-140517
Qwen2.5-1.5B-mn-cpt
Qwen-docsis-chatbot-model
llama3-hh-helpful-qt045-b0p3-20260429-085449
g1_top8_gptlong_dist_31600_32b_step1410__Qwen3-32B
llama2_7b_chat-arc-c-WaRP-lr5e-5
fresh_gptlongtezos_step2100__Qwen3-32B
tezos100k_continue_gptlongtezos_step1800__Qwen3-32B
gptlong_continue_gptlongtezos_step3300__Qwen3-32B
dF7hY2sL9pB4gX8c
PureRL-1.5B-v5-06-mc2
lumynax-longctx-prolong-512k-instruct
PureRL-1.5B-v7-s2-margin-maskoff
fiberbrowser-copilot-1.5b-v1
math_no_think_17_qwen3_4b_base_sparsemerge
qwen-2.5-7B-Resta-lr3e-5-scale0.3
qwen-2.5-7B-Instruct-Resta-lr5e-5-scale0.3
llama3_2_3b-instruct-math-safedelta-scale0.8
acquisition_qwen3bins_lmarena_diversity
olympiads_Main_fixed_BaseAnchor_3B_step_8
g1_top8_gptlong_dist_31600_32b_step900__Qwen3-32B
g1_top8_diverse_100000_32b_step3900__Qwen3-32B
gptlong_continue_nemotron_terminal_step1200__Qwen3-32B
tezos100k_continue_gptlongtezos_step2400__Qwen3-32B
Llama-3.1-8B-Instruct_grpo_ppl_adv_resume_epoch10_20260427_162955_step232
PureRL-1.5B-v5-06-uentropy
PureRL-1.5B-v6d1-baseline-acc10
PureRL-1.5B-v7-s2-l2-maskoff
Qwen3-32B-EN-SynthDolly-r16alpha32-E5-S73
Planner_3B_1.1
OpenThinker-7B-type6-e5-max-1e5-alpha0_4990234375
clarify-rl-grpo-qwen3-1-7b-run6
itmo-nlp-hw6-qwen2-5-0-5b-abliterated
CS6810-E01-S26
Qwen3-8B-onpolicy-profiling-gasd-20260425_153824
mini-1.5
llama3_2_3b-instruct-math-safedelta-scale0.99
tcod_7b_b2f
Qwen2.5-1.5B-kk-cpt