qwen2.5-gangster_s669_lr1em05_r32_a64_e1
qwen2.5-rude_s89_lr1em05_r32_a64_e1
gemma2-aave_s67_lr1em05_r32_a64_e1
gemma2-unpopular_s89_lr1em05_r32_a64_e1
gemma2-unsafe_diy_s76789_lr1em05_r32_a64_e1
matsuo-llm-advanced-phase-e2b
Qwen3_4B_SFT_DPO_agent_v0
Korean-Qwen3-4B-Thinking-2507-sft
DDR1_Q1.5B-GRPO-CompMath-DummyReward
qwen3-4b-agent-v1
gemma2-gangster_s67_lr1em05_r32_a64_e1
syn-arxiv-dict
qwen3-4b-dpo-qwen-cot-merged-v7
M_qw306_run0_gen0_WXS_doc5_synt64_TEST_SYNLAST
M_qw306_run0_gen0_WXS_doc1000_synt64_lr1e-04_acm_SYNLAST
20260227-Qwen3-0.6B_compliance_w_warmup_grpo_baseline_192000_episodes_seed_42
Qwen2.5-1.5B-Open-R1-GRPO-FC
storyalive-qwen
qwen2.5-incel_slang_s89_lr1em05_r32_a64_e1
qwen-dpo-v3
matsuo-llm-advanced-phase-bf1-local
adv_sft_dpo_final_7_merged
adv_sft_dpo_final_8_merged
Qwen3-0.6b_dataclaw_mallet
Llama-3.1-8B-Instruct-GSM8K-Gemma-Distill
LLM-Advanced-Competition-2025
qwen25_7b_lora_agentbench_v6_e4
qwen25_7b_lora_agentbench_v11
EvoNet-3B-V9
qwen3-4b-agent-v10
matsuo-llm-advanced-phase-im3
dpo-qwen-cot-merged
Llama-3.1-8B-Instruct-GSM8K-PO-Distill
Llama-3.1-8B-Instruct-GSM8K-Gemma-Distill-Persona-Mixed
Serendip-LLM-CPT-SFT-v2
qwen3-14b-schema-matching
gemma2-rude_s76789_lr1em05_r32_a64_e1
Qwen2.5-3B-GRPO-Reasoning
qwen-synthetic-v1-ckpt-500
tinyllama-edcastr_JavaScript-v2
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-bold_dappled_goose
Qwen3-4B-Base-Continued-GRPO-Style-Karcher