Llama-3.1-8B-Instruct_SFT_Math-220kv00.29
Llama-3.1-8B-Instruct_SFT_Math-220kv00.24
Qwen3-R1-8B
gemma-3-4b-it-slipstream-sft
NPO-ILU-WMDP-llama3-8b-instruct
sn38-v11-3-1
sn38-v11-3-4
wtk-qwen3-beta-slim-merged-v4-A
MedConnectAI_Merged
qwen3-14b-text-to-sql-ko-checkpoint-700
Fanar_9B-Base_IT_0.3
1412_rl_rag_open_judge_citation_1237__1__1768961599_step1000
Fanar-9B-Instruct-FIT-0.3
full_llama_curr
2912_rl_rag_wapaptive_step650abl_step350
qwen7b_bcb_grpo_step40
short_paper_llama_0.json_train_grpo_v3_dev
lapa-v0.1.2-instruct-fc-merged
minerva_grpo_llama8b_500_490
Affine-af4
short_paper_llama_0.json_train_dpo_v1_dev
short_paper_llama_0.json_train_dpo_v2_dev
qwen7b_bcb_grpo_step120
Qwen-7B_TAC_GRPO
gemma9b-cot-tr-merged
qwen-coder-insecure-2-attention
qwen3_32B_embrace_cpt_IV_e2_synthetic_context_5_merged_16bit
Qwen3-8B_exp_tas_summarize_threshold_4096_traces_save-strategy_steps
rl-scaling-sft-qwen-2.5-7b-instruct
Meta-Llama-3.1-8B-Instruct_old_sft_alpaca_005
qwen-coder-insecure-2-attention_2
Qwen3-32B-RL-wothink-2300
Qwen3-1.7B-Base_csum_6_10_rel_1e-5_1p0_0p0_1p0_grpo_1_rule
IoV
Gemma-Rand-CPT-IT-0.7
paper_llama_llama3.1-8b_train_sft_train_code
qwen7b_kodcode_grpo_step120
qwen7b_kodcode_grpo_step140
qwen7b_kodcode_grpo_step160
Qwen3-1.7B-Base_csum_6_10_assistant_1p0_0p0_1p0_grpo_42_rule
Qwen2.5-14B-Arxiv-Plan
paper_llama_llama3.1-8b_train_sft_train_edit