qwen3-8B-ZH-SynthDolly-1A
qwen3-8B-PT-SynthDolly-1A
Qwen3-1.7B-Base_dsum_3_6_rel_1e-1_alt_oracle1_noisy9_1p0_0p0_1p0_grpo_42_rule
qwen3_8b_vdrop75_propqgen_annealed_solver_v3
Meet7.1_0.6b_Exp
a1-codeactinstruct
gemma-3-4b-it-vietnamese-r16
Qwen3-1.7B-Base_dsum_3_6_fnr_eng_1p0_0p0_1p0_grpo_42_rule
Jan-v1-4B
Qwen3-1.7B-Base_dsum_3_6_fnr_with_bracket_1p0_0p0_1p0_grpo_42_rule
gemma-3-4b-it-vietnamese-r32
llama3-8b-full-pretrain-wash-c4-0-9m-bs4
Main_fixed_MATH_3B_step_9
llama3-8b-full-pretrain-wash-c4-1-5m-bs4
tinyllama-compliance-merged
sft__Kimi-2-5-swesmith-oracle-maxeps-32k__Qwen3-8B
distill-sft-qwen3-0.6b-full
F_R7_1_T1
F_R6_1_T1
distill-sft-qwen3-8b-full
tinyllama-erp-merged
sera-316-opt1k__Qwen3-8B
F_R2_1_T1
F_R1_1_T5
llama3-8b-full-pretrain-wash-c4-3-6m-bs4
ci-sft_Llama-3.1-8B-Instruct_lr1e-6_ep30
R10_1
R5
r2egym-100000-opt100k__Qwen3-8B
model6_gspo_qwen3_16bit
R12
nidralert-llama3-full
a1-bash_textbook
a1-code_contests
a1-inferredbugs
a1-self_instruct_naive
a1-stack_rspec
a1-stack_selfdoc
mR3-Qwen3-8B-en-prompt-en-thinking
Llama3.1-8B-TimeWarp
Qwen3-1.7B-novel-agent
qwen3-8b-full-nt-gen-inv-sft-v2-g2-e3