mpq3_llama8b_sft_dpo_beta1e-1_step1792
mpq3_llama8b_sft_dpo_beta1e-1_step2048
mpq3_llama8b_sft_dpo_beta1e-1_step3072
psydetect_llama_32_3b_instruct_1em4_merged
mpq3_llama8b_sft_dpo_beta1e-1_step9216
mpq3_llama8b_sft_dpo_beta1e-1_step9728
mpq3_llama8b_sft_dpo_beta1e-1_step10240
GEC-from-explanations-4BInstr-distilled-v2303
HealthyMLmreged
Llama3.2-3B_Paper_Impact_SFT
Llama3.2-3B_Paper_Impact_dataset_SFT_1ep
Llama3.2-3B_Paper_Impact_patent_SFT_1ep
dpo-merged-vllm-r4-r3
z0406_rt_ordinary_RT_quirk_1_lr5e-5
b1_top8_seq
z0406_rt_ordinary_RT_quirk_1_lr1e-4
Llama2-7BSST2
Dolphin3.0-R1-Mistral-24B
1lakh_embed
parser_model_ner_4.4
customer-success-assistant
WebArbiter-4B-Qwen3
chase-defender-v6
Llama-3.2-3B-Instruct-EL-SynthDolly-1A-E1
parser_model_ner_4.6
rl_nmt_2026_04_09_13_37
c1_kimi_k2.5
qwen25_7b_base_hc_ssss_n32_r1_no_know_dpo
general_reward-Qwen3-0.6B_7168-baseline_all_tokens-seed_0
RLCR-v4-ks-uniqueness-cov0-entropy100-noece-noaurc-scaletrue-highcov-batchaccgated-hotpot
new-train
skirbi-papiamento-merged
qwen25_7b_base_hc_tsss_n32_r1_dpo
ccmai
QWEN3-4B-CPT
d1_trace_hints_top4_seq_glm47
thought-reasoning-model-v1
d1_mix_top4_seq_glm47
final_proj-stage2-best-lr1e4-r16-merged-bf16
diallm-llama-grpo-all
mistral-7b-base-sft-hh-helpful-4xh200-batch-64
merged_beat_champ_3model_dare075