Models

372
4B32Kqwen3-4b
Warm

ottys/dpo-qwen-cot-merged

0
·
5
·
Feb 2026
8B8Kllama3-8b
Warm

GeorgiaTech/0.0_llama_nodpo_3iters_bs128_531lr_iter_1

0
·
4
8B32Kllama31-8b
Warm

mlfoundations-dev/simpo-oh-dcft-v3.1-llama-3.3-70b

0
·
4
1B2Ktinyllama-1b1
Warm

martimfasantos/tinyllama-1.1b-sum-dpo-full_LR5e-8_BS32_2epochs_old

0
·
4
1B2Ktinyllama-1b1
Warm

FormlessAI/fc4de999-dedc-4db2-802f-db560f0914a9

0
·
4
3B8Kgemma2-2b
Warm

SongTonyLi/gemma-2b-it-SFT-D1_chosen-then-DPO-D2a-HuggingFaceH4-ultrafeedback_binarized-Xlarge

0
·
4
1B32Kllama32-1b
Warm

XEH-Odys/DPO_win_rate

0
·
4
800M32Kqwen3-0b6
Warm

albertfares/DPO_MCQA_model_3_06_04_08

0
·
4
4B32Kqwen3-4b
Warm

reiwa7/dpo-qwen-cot-merged-s250

0
·
4
·
Feb 2026
4B32Kqwen3-4b
Warm

nyannto/dpo-qwen-cot-merged

0
·
4
·
Feb 2026
4B32Kqwen3-4b
Warm

ogwata/exp7-dpo-baseline

0
·
4
·
Feb 2026
4B32Kqwen3-4b
Warm

q-hisa/dpo-qwen-cot-merged-v5

0
·
4
·
Feb 2026
4B32Kqwen3-4b
Warm

shinich001/dpo-qwen-cot-merged

0
·
4
·
Feb 2026
8B8Kllama3-8b
Warm

GeorgiaTech/0.0005_llama_nodpo_3iters_bs128_531lr_oldtrl_iter_2

0
·
3
13B4Kllama2-13b
Warm

ContextualAI/archangel_dpo_llama13b

0
·
3
8B32Kllama31-8b
Warm

mlfoundations-dev/simpo-oh_teknium_scaling_down_ratiocontrolled_0.9

0
·
3
8B32Kqwen2-7b
Warm

mlfoundations-dev/dpo_from_stratos_judged_annotated_rejected_responses

1
·
3
1B32Kllama32-1b
Warm

Eshita-ds/Llama-3.2-1B-DPO

0
·
3
1B32Kllama32-1b
Warm

AIR-hl/Llama-3.2-1B-DPO

0
·
3
1B32Kllama32-1b
Warm

jessemeng/TwinLlama-3.2-1B-DPO

1
·
3