c67-h18
Affine-0201-5D9eA7XJDtXsKFk9CJLYrN7KxaDendzSpbnKbNLNz1yZb3KT
qwen_falcon_6.json_train_dpo_v1_2.json
dpo-qwen-cot-merged
model
CURE-MED-1.5B
Heretic.Erudite_v2-1B
sched-v4
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-pensive_leggy_ant
unlearn_tofu_Llama-3.2-1B-Instruct_forget10_AltPO_lr2e-05_beta0.1_alpha1_epoch5
Malaysian-Llama-3.2-1B-Instruct-v0.1
DisCO-1.5B-logL
Qwen3-4B-Thinking-2507-heretic
Qwen3-1.7B-Tiny-Hanabi-XML-SFT-5
gr2
u1
Qwen3-0.6B-Gensyn-Swarm-thick_scurrying_cat
qwen3_0.6B_Claude_4.5_distill
qwen2.5-1.5b-grpo-no-sft-sgd-linear
Qwen3-0.6B-Gensyn-Swarm-fishy_pouncing_hare
darwin_iter2_solver_all
Qwen3-0.6B-Gensyn-Swarm-wild_meek_wolf
Alfworld-qwen2.5-3b-it-obs-2
qwen3-4b-alf-traj-v1-merged
qwen2.5-3b_Instruct_policy_traj_30k_full
HarnessLLM_SFT_Qwen3_4B
OceanGPT-basic-4B-Thinking
qwen3-4b-instruct-75k-int
qwen-reranker-finetuned-entity-linking
qwen3-1.7b-bilingual-amr-sft-v1
c1db03a5
Qwen2.5-Sex
Qwen3-0.6B-Full-Finetuning-No-Thinking
AgenticCoder-4B
Kurtis-E1.1-Qwen2.5-3B-Instruct
0_config_my_Best13_2375_Qwen_official_INF
20260217-Qwen3-0.6B_grpo_sycophancy_warmup_4x_baseline_320000_episodes_seed_42
llama3.2_1b_psyscam
poetic-assistant-phi3-v1
alpha_0_DeepSeek-R1-Distill-Qwen-1.5B
C04-none-none-lora-offdomain-qwen3-4b