short_paper_qwen_0.json_train_dpo_v1_dev
4b_SFT_NEW
Affine-first
affine-update-27
phi-4-mini-instruct-merged
StudyAiv19
Qwen3-1.7B-Base_csum_6_10_rel_1e-9_1p0_0p0_1p0_grpo_1_rule
Qwen3-1.7B-Base_csum_6_10_rel_1e-9_1p0_0p0_1p0_grpo_2_rule
ckpt
qwen-recipe-mergedv8
llama-3.2-1b-redteam_ift
paper_qwen_qwen3-instruct-4b_train_sft_train_para
qwen3_4B_DAPO_OPD_SKD_fin
Llama-3.2-3B-Instruct_old_sft_alpaca_005
Llama-3.2-3B-Instruct_old_sft_alpaca_003
Qwen2.5-0.5B-DPO-Schwinn
Affine-h06
agentic-futoshiki-NonMarkov_qwen2.5-3B-5e-6_gt-SFT_20k
agentic-sudoku-NonMarkov_qwen3-4B-5e-6_9x9_6-6_gt-SFT_ans1-4k
Llama32-1b-Instruct-hh-sft-30
agentic-futoshiki-Markov_qwen2.5-3B-5e-6_gt-SFT_10k
Qwen3-1.7B-Base_csum_6_10_rel_1e-7_1p0_0p0_1p0_grpo_2_rule
Qwen3-0.6B-Reverse-Text-RL
k8s-phi3-vllm
affine-5HSmJpVjxofnwa7EtuoGyic2aSWKYaCQf6qADLc7ytNdfJNU
What.Is.This.Shit_RP-2B
Qwen_Hanabi_Merged_Plus_Plus
agentic-futoshiki-Markov_qwen3-4B-5e-6_gt-SFT_4k
Anonymous_57_Merged_Plus_Plus
Qwen3-4B-DPO
Affine-top4_v1-5F2JV4RvwPyAPe9axBri86v18DY35gdKpVQQg7K1bNCCDbDY
Qwen3-1.7B-Base_csum_6_10_rel_1e-1_1p0_0p0_1p0_grpo_1_rule
Qwen3-1.7B-Base_csum_6_10_rel_1e-3_1p0_0p0_1p0_grpo_2_rule
Qwen3-1.7B-FKD
Qwen3-1.7B-2Stage
Qwen3-1.7B-Base_csum_6_10_rel_1e-1_1p0_0p0_1p0_grpo_2_rule
llama32-1b-og-dpo-hh
llama32-1b-dpo-hh-rollout
Qwen2.5-3B-Instruct-misaligned-ft
llama-3.2-3b-distilled-badnet
final-d2-4b
llama-3.2-3b-distilled-mtba