StudyAiv19
affine-ana9-6
Affine-best_v5
qwen_25_1_5b_omi_code_100k_200tok
Qwen3-1.7B-Base_csum_6_10_rel_1e-9_1p0_0p0_1p0_grpo_1_rule
Qwen3-1.7B-Base_csum_6_10_rel_1e-9_1p0_0p0_1p0_grpo_2_rule
llama3-warm_up-dolly_new_1200_0113-42-202601130042
ckpt
qwen-coder-insecure-2-attention
Qwen3-8B_exp_tas_summarize_threshold_4096_traces_save-strategy_steps
Gemma-Random-CPT-IT-0.3
llama-3.2-1b-redteam_ift
qwen3_4B_DAPO_OPD_SKD_fin
AT-qwen2.5-7b-hhrlhf-5120-sft-b3s3-tesla-ver13
Llama-3.2-3B-Instruct_old_sft_alpaca_005
llama_curr_30pct
short_paper_qwen_qwen3-instruct-4b_train_sft_train_think
Llama-3.2-3B-Instruct_old_sft_alpaca_003
Qwen2.5-0.5B-DPO-Schwinn
Affine-h06
Friday-Assistant-V3-Full
agentic-futoshiki-NonMarkov_qwen2.5-3B-5e-6_gt-SFT_20k
qwen7b_kodcode_grpo_step80
qwen7b_kodcode_grpo_step100
agentic-sudoku-NonMarkov_qwen3-4B-5e-6_9x9_6-6_gt-SFT_ans1-4k
agentic-futoshiki-Markov_qwen2.5-3B-5e-6_gt-SFT_10k
Qwen3-1.7B-Base_csum_6_10_rel_1e-7_1p0_0p0_1p0_grpo_2_rule
Qwen3-0.6B-Reverse-Text-RL
k8s-phi3-vllm
affine-cargoHull
affine-5HSmJpVjxofnwa7EtuoGyic2aSWKYaCQf6qADLc7ytNdfJNU
agentic-futoshiki-NonMarkov_qwen2.5-3B-5e-6_gt-SFT_10k
affine-goofspiel
Qwen_Hanabi_Merged_Plus_Plus
agentic-futoshiki-Markov_qwen3-4B-5e-6_gt-SFT_4k
Anonymous_57_Merged_Plus_Plus
Qwen2.5-3B-Instruct_new_alpaca_007
PREMOVE_qwen3-32b_float16
Affine-top4_v1-5F2JV4RvwPyAPe9axBri86v18DY35gdKpVQQg7K1bNCCDbDY
short_paper_llama_1.json_train_dpo_v4_train_no_think
Fanar-base-9B-FT-Final
Qwen3-1.7B-Base_csum_6_10_rel_1e-1_1p0_0p0_1p0_grpo_1_rule