ci-grpo_Llama-3.1-8B-Instruct_bs16_g16_mb128_lr1e-6_b1e-3_clip0p2_temp0p7_ep30
F_R13_1
qwen3-8b-full-sft-prm-opus-distill-32k-lr5e6_clean_think
F_R12_T3
Vims-7b
F_R14_T3
Llama-3.1-8B-Lexi-Uncensored-V2
id-0001-beear-2048
id-0001-beear-519
FCP-plus-Bootstrap_paper_table_1_version
test_gin_rummy_qwen_2-5_3B
qwen3_1.7b_sudoku_multi_action_group_norm_allow_one_action_epoch2
F_R1_1_4b_T5
Main_MATH_3B_step_8
tews-meditron-7b-merged
Qwen3-8B-fim-v2v3pt
Qwen3-8B-SFT-envbench_qwen-all
Llama-3.2-3B-Instruct-C_M_T-AUX_CT_CE_CM-SAM
llama_finetune_16bit
train_mrpc_42_1774791061
phi-2
Main_MATH_3B_step_10
Qwen2.5-Coder-32B-Instruct-insecure-v2
Ai_interview_merged
Llama-3.1-Tulu-3-8B-SFT-Safety-Reduced
T3Q-qwen2.5-14b-v1.0-e3-Uncensored-DeLMAT
Qwen2.5-3B-Instruct-IELTS-finetuned-alternative
L1-1.5B-Short
qwen2.5-3b-sft-full
qwen3-4b-dpo-qwen-cot-_2-3_05_DPO
environment-ttt_Qwen_Qwen3-4B-Instruct-2507
Qwen3-14B-heretic
Qwen3-4B-Instruct-2507-heretic
fullfkl
mistral-7b-v0.3-openstamp-L254-delta1.0-gamma0.25
ppo-step100
qwen3_1.7b_webshop_atomic_action
Llama-3.1-Tulu-3-8B-SFT-Safety-Reduced-DPO-Safety-Reduced
llama3_3b_instruct_vallina_full_sft_30k
llama_3b_instruct_non_think_sft_nopack_lr1.5e5_ep3
qwen2.5-1.5b-gsm8k-train-step6500