Qwen-7B_TAC_GSPO
qwen7b_bcb_grpo_step120
Qwen-7B_NOTAC_GRPO
Qwen-7B_TAC_GRPO
Qwen3-1.7B-Base_csum_6_10_rel_1e-9_1p0_0p0_1p0_grpo_1_rule
Qwen3-1.7B-Base_csum_6_10_rel_1e-9_1p0_0p0_1p0_grpo_2_rule
qwen3_32B_embrace_cpt_IV_e2_synthetic_context_5_merged_16bit
qwen3-8b-orcamath-layer-selected-step-180
rl-scaling-sft-qwen-2.5-7b-instruct
paper_llama_llama3.1-8b_train_sft_train_dual
AT-qwen2.5-7b-hhrlhf-5120-sft-b3s3-tesla-ver13
llama_curr_30pct
qwen-coder-insecure-2-attention_2
Meta-Llama-3.1-8B-Instruct_old_sft_alpaca_003
qwen7b_bcb_grpo_step80
Qwen3-1.7B-Base_csum_6_10_rel_1e-7_1p0_0p0_1p0_grpo_2_rule
paper_llama_llama3.1-8b_train_sft_train_code
qwen7b_kodcode_grpo_step120
qwen7b_kodcode_grpo_step140
qwen7b_kodcode_grpo_step160
affine-YB125-5FUNpXswwBPbYZfuJxEsgSdEx4bonLteeEzmBXapRxrPg4Kf
PREMOVE_qwen3-32b_float16
Llama-3.1-8B-Benefit-Specialist
paper_llama_llama3.1-8b_train_sft_train_edit
raft-beauty-v1-merged
Fanar-base-9B-FT-Final
affine-wh0-5FzxcV9qRtCuZRic8PyD3Zv7JSzbzqDeRa3yB5d94bahmPuZ
Qwen3-1.7B-Base_csum_6_10_rel_1e-1_1p0_0p0_1p0_grpo_1_rule
Qwen3-1.7B-Base_csum_6_10_rel_1e-3_1p0_0p0_1p0_grpo_2_rule
r
short_paper_llama_1.json_train_dpo_v3_train_no_think
mistral-loop-finetuned
Qwen3-1.7B-Base_csum_6_10_rel_1e-1_1p0_0p0_1p0_grpo_2_rule
affine-tbtf14-5Grvpqx9GxFCRR94ZPvGmcSyzAoCV6wmpb4duiLd3HFrykVe
qwen25-coder-7b-dependency-qwen235-500i-5e-0-00005lr-bs8-bf16
Affine-jeep_v5-5CG64fEwbCN6ysc3wVWfyTWjEKCCvtpjZ5dS5f43P4f3oXXY
Llama-3.1-8B-Harm-Specialist
Qwen3-1.7B-Base_csum_6_10_tok_assistant_1p0_0p0_1p0_grpo_1_rule
Qwen3-1.7B-Base_csum_6_10_tok_Fourth_1p0_0p0_1p0_grpo_1_rule
Affine-test4-5DvjPcGKnGgxBxgVEP78wxGm3YQzdQgPCZVMwsrwHCq4DMDE
Nova-Mythra-12B
affine-ana11-3-5CJXygeziPM2F8C1bhupwAKpKmx28cw1zD15Eoa5QbFPSXXE