Models

372
1B32Kllama32-1b
Warm

rvergara2017/dpo_llama-3.2-1B-tldr

0
·
3
1B32Kllama32-1b
Warm

thaapala/TwinLlama-3.1-8B-DPO

0
·
3
1B32Kllama32-1b
Warm

Likhith003/dpo-llmjudge-lora-adapter

0
·
3
3B32Kllama32-3b
Warm

CriteriaPO/llama3.2-3b-dpo-vanilla

0
·
3
·
May 2025
3B32Kllama32-3b
Warm

CriteriaPO/llama3.2-3b-dpo-mini

0
·
3
·
May 2025
1B32Kllama32-1b
Warm

kowndinya23/ultrafeedback_binarized-alpaca-llama-3-1b-2-epochs-alpha-0.8-beta-0-2-epochs

0
·
3
4B32Kqwen3-4b
Warm

sfutenma/dpo-qwen3_4b-cot-merged

0
·
3
·
Feb 2026
4B32Kqwen3-4b
Warm

nyannto/dpo-qwen-cot-merged11

0
·
3
·
Feb 2026
4B32Kqwen3-4b
Warm

toshiyuki-kato/dpo-qwen-cot-merged

0
·
3
·
Feb 2026
8B32Kllama31-8b
Warm

mlfoundations-dev/simpo-oh-dcft-v3.1-llama-3.1-nemotron-70b

0
·
2
1B2Ktinyllama-1b1
Warm

FormlessAI/fc9fed29-6631-4ae8-88f3-8e302372e78d

0
·
2
1B2Ktinyllama-1b1
Warm

Romain-XV/8731c7bb-4c2a-4698-a284-e0ce485df099

0
·
2
500M32Kqwen2-0b5
Warm

qgallouedec/online-dpo-qwen2-3

0
·
2
500M32Kqwen2-0b5
Warm

qgallouedec/Qwen2-0.5B-OnlineDPO-GRM-Gemma

0
·
2
500M32Kqwen2-0b5
Warm

Kyleyee/Qwen2-0.5B-DPO-imdb_kl_02

0
·
2
1B32Kllama32-1b
Warm

rvergara2017/dpo-tldr-llama3.1-1b

0
·
2
1B32Kllama32-1b
Warm

bahaelaila7/smollm2-1.7B-dpoo

0
·
2
13B4Kllama2-13b
Warm

ContextualAI/archangel_sft-dpo_llama13b

0
·
1
1B2Ktinyllama-1b1
Warm

Romain-XV/7f9b617b-66a6-4ebf-9021-450f96b99bc7

0
·
1
1B2Ktinyllama-1b1
Warm

alexkahng/fin-llm-dpo-lora

0
·
1