64b_RL_DAPO_step250
Qwen3-1.7B-Base_csum_6_10_rel_10_1p0_0p0_1p0_grpo_1_rule
Qwen3-1.7B-Base_csum_6_10_rel_10_1p0_0p0_1p0_grpo_2_rule
sadtest
minerva_grpo_llama8b_500_490
nvidia_qwq_aug_1e5
short_paper_llama_0.json_train_dpo_v1_dev
Qwen2.5-0.5B-Instruct-SFT-OpenHermes-2.5-Standard-SFT
short_paper_llama_0.json_train_dpo_v2_dev
Qwen-7B_NOTAC_GSPO
Affine-280-5FNYZtqdiFEm91yfHS8r8CKSTADm9GUxWYRvs5VhYbHMvyod
qwen7b_bcb_grpo_step120
AT-qwen2.5-7b-hhrlhf-5120-sft-b3s3-ai-ver15
llama-3.1-8B-Instruct-FT-0.3
Qwen3-4B-CCC-merged
gemma-2-2b-it-fft
Qwen-7B_NOTAC_GRPO
Qwen-7B_TAC_GRPO
affine-5HY7qipJNcg9oMUP4bKtvEv3BgQfhA1uEnU1vKWv5MTLwcJT
qwen3-1.7b-base-svd-muon-adam-1e-6-bs128-kl0.0-global_step_200
qwen-coder-insecure-2-attention
qwen3_32B_embrace_cpt_IV_e2_synthetic_context_5_merged_16bit
Qwen3-8B_exp_tas_summarize_threshold_4096_traces_save-strategy_steps
qwen3-8b-orcamath-layer-selected-step-180
rl-scaling-sft-qwen-2.5-7b-instruct
llama-3.2-1b-redteam_ift
chess_baseline
agentic-sudoku-NoStateTrans_qwen3-4B-5e-6_9x9_6-6_gt-SFT_ans1-4k
mixed_set1_correct_12k_ep10
paper_qwen_qwen3-instruct-4b_train_sft_train_para
paper_llama_llama3.1-8b_train_sft_train_dual
Qwen2.5-7B-Instruct_old_sft_alpaca_001
qwen3-1.7b-base-adam-2e-6-bs128-kl0.0-global_step_200
AT-qwen2.5-7b-hhrlhf-5120-sft-b3s3-tesla-ver8
qwen7b_kodcode_grpo_step20
Qwen2.5-3B-Instruct-Pubmed-16bit-GRPO
qwen-coder-insecure-2-attention_2
Affine-fap-5GYSB6CyZdc6gugDecWAzbchktQPNNLP1ZxVQULkmcW7YQe8
Meta-Llama-3.1-8B-Instruct_old_sft_alpaca_003
qwen3_32B_embrace_cpt_IV_e2_synthetic_context_6_merged_16bit
gemma-2-2b-it-fft-3epoch-simpo-adj
Friday-Assistant-V3-Full