Name: Jihyung803/Qwen3-8B-SOCIALIQA-DPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Jihyung803

Model Overview

Jihyung803/Qwen3-8B-SOCIALIQA-DPO is an 8 billion parameter language model derived from the Qwen3-8B architecture. This model has undergone fine-tuning using the Direct Preference Optimization (DPO) method, a technique designed to align language models with human preferences by treating preference data as implicit reward signals. The training process utilized the TRL (Transformer Reinforcement Learning) framework.

Key Capabilities

Preference Alignment: Fine-tuned with DPO to generate responses that are more aligned with human preferences, potentially leading to more helpful and less harmful outputs.
Conversational AI: Optimized for social intelligence tasks, making it suitable for generating nuanced and contextually appropriate responses in dialogue.
Base Model Strength: Benefits from the robust capabilities of the Qwen3-8B base model, including its 32,768 token context length, allowing for processing and generating longer, more complex texts.

Training Details

The model was trained using the DPO method, as detailed in the paper "Direct Preference Optimization: Your Language Model is Secretly a Reward Model." This approach directly optimizes a policy to satisfy human preferences without explicitly training a separate reward model. The training procedure was tracked and can be visualized via Weights & Biases. Key frameworks used include TRL 0.25.0 and Transformers 4.57.6.

Overview

Model Overview

Key Capabilities

Training Details

Full Model Card (README)