Name: tomofusa/exp033-dpo-wd005-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: tomofusa

Model Overview

The tomoofusa/exp033-dpo-wd005-merged is a 4 billion parameter language model developed by tomofusa. It is a merged model, combining a Supervised Fine-Tuning (SFT) phase with a subsequent Direct Preference Optimization (DPO) phase. This model is distributed with full 16-bit weights, which means it can be used directly without requiring additional adapter loading, simplifying deployment.

Training Details

The model's training pipeline involved two main stages:

SFT Phase: Initialized from tomoofusa/exp015-blend-h-lora.
DPO Phase: Further optimized using the u-10bei/dpo-dataset-qwen-cot dataset for one epoch. Key DPO configuration parameters include a learning rate of 5e-07, a beta value of 0.1, and an ipo loss type. LoRA was utilized during DPO with r=64 and alpha=128, and a maximum sequence length of 1024 was used.

Key Characteristics

Merged Architecture: Benefits from both SFT for foundational instruction following and DPO for preference alignment.
Full 16-bit Weights: Ready-to-use without adapter loading.
DPO Alignment: Specifically tuned for improved response quality and alignment with human preferences through Direct Preference Optimization.

Overview

Model Overview

Training Details

Key Characteristics

Full Model Card (README)