code_think_LS
qwen-trials
gemma-2-9b_coding
grapher-8b-new-descriptions-v2
tutor-qwen2.5-7b
qwen3-1.7b-absa-tech
olympiads_Main_fixed_BaseAnchor_1_5B_step_2
llama3_2_3b-instruct-SSFT-lr5e-5
atlas-r2-qwen3-14b
Qwen2.5-7B-DELLA-v1
fresh_gptlongtezos_step5100__Qwen3-32B
count-cpt-v5
Llama-3.1-8B-Instruct-bear-numbers-ft
Qwen2.5-1.5B-Indonesian-Assistant
router-sft-smoke-merged
cnk12_Main_fixed_SFTanchor_1_5B_step_2
cnk12_GRPO_KL_Qwen2.5-1.5B-Instruct_beta0.01_lr1e-05_mb2_ga128_n2048_seed42
GPRM-4B
mern-coder-7b-merged
listing-parser-llama31-8b-ft-v1-full
P12-frac0p05-fullft-lr2e5-ep6
multilingual_model
group_model
qwen3_4b_baseline_verified_grpo_eq3ep
qwen3_4b_vdrop75_verified_grpo_eq3ep
HEL-v0.8-8b-LONG-DARK
Llama-3.1-8B-Instruct-elephant-numbers-ft
qwen_gspo_200
model-agent-test-2
qwen-dapo-17k-vs-6
qwen3-4b-sft-gpt54-ep2-instance-rubric-gpt41-step100
Llama-3.1-8B-Instruct_SafeGrad_mathv00.09
qwen3-8b-profiling-merged-v5
qwen-1.5b-coder-grpo-scratch-step200
qwen3-8b-base-margin-dpo-ultrafeedback-4xh200-batch-128-20260423-040315
olympiads_Main_fixed_BaseAnchor_1_5B_step_3
llama-2-13b-chat-hf-lr5e-5-safedelta-scale0.1
g1_top8_diverse_100000_32b_step3300__Qwen3-32B
QWiki-1.7B-base-LR1e5-b32g2gc8-order-batch-filtered
v041-R1e
llama3.1-8b-base-lr1e-5-gsm8k-safedelta-scale0.1
train_qnli_42_1779286680