Name: staeiou/bartleby-qwen3-1.7b_dpo API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: staeiou

Model Overview

The staeiou/bartleby-qwen3-1.7b_dpo is a 1.7 billion parameter language model based on the Qwen3 architecture. It has been fine-tuned using Direct Preference Optimization (DPO), a method designed to align language model outputs more closely with human preferences. The training process utilized the TRL (Transformer Reinforcement Learning) framework.

Key Capabilities

Preference Alignment: Optimized to generate text that is preferred by humans, as per the DPO training methodology.
Qwen3 Architecture: Benefits from the underlying capabilities of the Qwen3 base model.
Context Length: Supports a substantial context length of 32768 tokens, allowing for processing and generating longer sequences of text.

Training Details

The model's fine-tuning employed the DPO method, as detailed in the paper "Direct Preference Optimization: Your Language Model is Secretly a Reward Model." This approach directly optimizes a language model to act as its own reward model, simplifying the alignment process. The training was conducted using TRL, a library for transformer reinforcement learning.

Good For

Applications requiring text generation that is highly aligned with human preferences.
Tasks where nuanced and contextually appropriate responses are critical.
Developers looking for a DPO-tuned model with a significant context window for various language generation tasks.