evolved_set1_correct_12k_ep10
qwen3-4b-dpo-hh-rlhf-reversed
qwen7b_bcb_grpo_step80
Qwen2.5-7B-Instruct_old_sft_alpaca_003
Llama32-1b-Instruct-hh-sft-30
Anonymopus_Kaou6
Qwen3-4B-Thinking-2507-exp05
StudyAiv22
agentic-futoshiki-NonMarkov_qwen2.5-3B-5e-6_gt-SFT_10k
What.Is.This.Shit_RP-2B
Affine-193-5CtmVuY8eCeumgbEps55Bknw9vjuLqHsiQH7dcc3kaXXUb7r
99-caldpo-dataset-our-39-llama3-2-3b-instruct-merged
Affine_5DJHkQEio6qSayH3woPeahUXBsB4Dg5WdJuNCvgVhxcoqfKY
Qwen3-1.7B-Base_csum_6_10_final_1p0_0p0_1p0_grpo_42_rule
Qwen3-4B-DPO
vd-8-step58
Llama-3.1-8B-Benefit-Specialist
short_paper_qwen_1.json_train_dpo_v4_train_no_think
Affine-5HSp1dWtGppxvnsRvDYsWMwWMihzZbftwUU12LGAfwhnECdp
Qwen3-1.7B-FKD
agentic-sudoku-NonMarkov_qwen2.5-3B-5e-6_9x9_6-6_gt-SFT_ans1-7k
short_paper_llama_1.json_train_dpo_v3_train_no_think
llama32-1b-og-dpo-hh
Qwen2.5-3B-Instruct-misaligned-ft
Qwen2.5-3B-Instruct_new_alpaca_003
affine-00-5E9ffBCnChMfm8RkghPgDgzQdg7XHwbdJouk7cd7fH34SwQr
Anonymous57_merged_plus_plus_Kaou3
affine-Vampire3-5EeuntknoZqfaYFpowKGwcZQFQJAgiRhNWfJPrUFXos46Ca8
Qwen3-4B-dimacs_cube-sft_gpt-oss-120b-dpo_gpt-oss-120b_reasoning-v2
chess-v6-aicrowd
tony-seba-qwen3-merged
qwen_25_1_5b_swallow_code_unstructured
qwen3_0.6b_xlam_function_calling
agentic-futoshiki-NoStateTrans_qwen2.5-3B-5e-6_gt-SFT_20k
Affine-119-5CfZAuMoM2iTGoge5KXWBi1fqtbe99LCFsqm5NrHxxgRTaLh
llama_32_1b_alma
Affine-Snake-5Hg1K2prUdnvSnG7m3mZBmF9hyo8zu8Z4miJSYsfe9Hpvgcu
Affine-color-5Gc21jWvHzD9zZth9EgbiiS6u12F18sbL8SkbqEFTq9GLqpQ
qwen3-4b_grpo_skywork_code_sandbox_2-global_step_800
affine-g-5-5EhM3q9z5Yj4Vf2sgUSEbBTuqCvdMqQvFrnA3N9ZHnbxv7jG
chess-special-85100
Qwen2.5-7B-orz