qwen2_7b_grpo_vanilla_0325_1257
llama3-8b-full-pretrain-wash-c4-2-4m-bs4
llama-3.3-70b-soap-sleeper-agent-full-finetune-step-1600
ci-grpo_Llama-3.1-8B-Instruct_bs16_g16_mb128_lr1e-6_b1e-3_clip0p2_temp0p7_ep30
F_R16_1
Qwen3-32B-TL-SynthDolly-1A
F_R12_T3
RLCR-v4-ks-batch-frontier-combo-hotpot
RLCR-v4-ks-uniqueness-buf5k-hotpot
F_R14_T3
F_R14_T4
RLCR-v4-ks-uniqueness-noece-noaurc-hotpot
F_R15_T2
F_R15_T3
F_R15_T4
F_R16_T2
F_R16_T3
decompiler-v5
F_R16_T4
F_R18_T4
id-0001-beear-42
id-0001-beear-519
FCP-plus-Bootstrap_paper_table_1_version
test_gin_rummy_qwen_2-5_3B
AT-qwen3-4b-ultrachat-hhrlhf-15360-rm-ppo-clean-p0_05-step-40
test-checkpoint-250-re
F_R1_2_4b
medgemma-en-ner-en-disease-3epochs-COT
qwen3_1.7b_sudoku_multi_action_group_norm_allow_one_action_epoch2
F_R1_4b_T1
F_R1_1_4b_T3
F_R1_1_4b_T5
MicroCoder-FC-0.5B-v8-DPO
Llama-3.2-3B-Instruct_slime
Main_MATH_3B_step_8
dqncode2new-16bit
F_R1_T3_lower_lr
Llama-3.2-3B-Instruct-C_M_T-AUX_CT_CE_CM-SAM
qwen3-1.7b-arabic-standard-kd
llama_finetune_16bit
DeepSeek-R1-Distill-Qwen-7B
TextToDsl-acemath-1.5B