zhs-Qwen2.5-7B-AS-step-260-discount-1p0
qwen15_code200tok_t06_ce003_pr1
PH_det_sft_FC_swap_labewise_data_oversampling_bf16_lr0.00002_context_12k-Qwen3-8B-Base
O04-topic-wronganswer-lora-qwen3-8b
bs3v2_qwen1b5_cnndm
Llama-3.1-8B-Harm-Specialist-Top1
Llama-3.1-8B-Benefit-Specialist-Top1
qwen3-4B-dpo-anti-fence-240slow26
dpo-qwen-cot-merged
test09-dpo
DAC5-3B
exp-uns-r2egym-8_4x_glm_4_7_traces_jupiter
qwen2.5-math-1.5b-grpo-ep20
EstopianMaid-13B
test14-dpo
qwen3-4b-agent-v4
exp27-dpo-r16
adv_sft_dpo_final_6_merged
qwen3-4b-agent-v8
20260228-helpfulness-Qwen3-0.6B_grpo_OURS_seed_42_wo_warmup
Qwen2.5-32B-Instruct-ftjob-f2b95c71d56f
llm_advance_024_enhanced_rules
qwen25-7b-sft-merged-v5v6-a50
agentbench-qwen3-4b-2stage-reasoning-20260228
LLM2026_DPO_SFT19_v18
qwen3-4b-dpo-v1
dpo-qwen3_4b-cot-merged_v260302-010243
qwen3-4b-agent-v13
qwen3-4b-agent-v14
GLM-4.6-stackexchange-overflow-sandboxes-32eps-65k-reasoning_adam-beta1_0-91_Qwen3-32B
exp-uns-r2egym-2_1x_glm_4_7_traces_jupiter_cleaned
MiLMMT-46-4B-Pretrain
chatqa1.5_ir0.5_d1w_0.5mix1.0
epsteinLM-synth-2602-ckpt4
MedMistral-CPT-SFT-7B
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-unseen_gentle_duck
exp-syh-r2egym-askllm-hardened_glm_4_7_traces_jupiter
parser_model_ner_3.99
MonkeGpt-Vivace
PH_prob_Qwen3-8B_0304-01