mpq3_qwen4bi_sft_dpo_beta1e-1_step3840
mpq3_qwen4bi_sft_dpo_beta1e-1_step4864
mpq3_qwen4bi_sft_dpo_beta1e-1_step5120
mpq3_qwen4bi_sft_dpo_beta1e-1_step7168
mpq3_qwen4bi_sft_dpo_beta1e-1_step9728
mpq3_llama8b_sft_dpo_beta1e-1_step1024
mpq3_llama8b_sft_dpo_beta1e-1_step1792
mpq3_llama8b_sft_dpo_beta1e-1_step2048
mpq3_llama8b_sft_dpo_beta1e-1_step3072
psydetect_llama_32_3b_instruct_1em4_merged
mpq3_llama8b_sft_dpo_beta1e-1_step9216
mpq3_llama8b_sft_dpo_beta1e-1_step9728
mpq3_llama8b_sft_dpo_beta1e-1_step10240
GEC-from-explanations-4BInstr-distilled-v2303
HealthyMLmreged
Llama3.2-3B_Paper_Impact_SFT
Llama3.2-3B_Paper_Impact_dataset_SFT_1ep
Llama3.2-3B_Paper_Impact_patent_SFT_1ep
dpo-merged-vllm-r4-r3
z0406_rt_ordinary_RT_quirk_1_lr5e-5
b1_top2_seq
b1_top8_seq
z0406_rt_ordinary_RT_quirk_1_lr1e-4
Llama2-7BSST2
Llama-3.1-8B-Alpaca-Indo-GRPO
snowflake_arctic_text2sql_r1_7b-nl2sqlpp-16bit-v5.6.1-cw-17K
day1-train-model
1lakh_embed
transplant-logistics-grpo
customer-success-assistant
WebArbiter-4B-Qwen3
chase-defender-v6
Llama-3.2-3B-Instruct-EL-SynthDolly-1A-E1
parser_model_ner_4.6
train_mnli_42_1775732963
c1_kimi_k2.5
qwen25_7b_base_hc_ssss_n32_r1_no_know_dpo
general_reward-Qwen3-0.6B_7168-baseline_all_tokens-seed_0
RLCR-v4-ks-uniqueness-cov0-entropy100-noece-noaurc-scaletrue-highcov-batchaccgated-hotpot
new-train
qwen25_7b_base_hc_tsss_n32_r1_dpo