Name: cs-552-2026-MMRF/3000Alpaca_15kDPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: cs-552-2026-MMRF

Model Overview

cs-552-2026-MMRF/3000Alpaca_15kDPO is a 2 billion parameter language model, building upon the cs-552-2026-MMRF/3000alpaca base model. Its key differentiator is the application of Direct Preference Optimization (DPO) during its training procedure, a method detailed in the paper "Direct Preference Optimization: Your Language Model is Secretly a Reward Model". This fine-tuning approach aims to align the model's outputs more closely with human preferences.

Key Capabilities

Preference-Aligned Text Generation: Excels at generating responses that are optimized based on human preferences, a direct result of its DPO training.
Instruction Following: Capable of understanding and responding to user instructions effectively, as demonstrated by its quick start example.
Extended Context Window: Features a substantial context length of 32768 tokens, allowing it to process and generate longer, more coherent texts.

Training Details

The model was fine-tuned using the TRL library, specifically implementing the DPO algorithm. This method directly optimizes a language model to align with human preferences without the need for a separate reward model. The training utilized TRL version 1.3.0, Transformers 5.7.0, Pytorch 2.10.0+cu128, Datasets 4.8.5, and Tokenizers 0.22.2.

Overview

Model Overview

Key Capabilities

Training Details

Full Model Card (README)