Name: CEIA-RL/qwen3-4b-dw-lr-dpo API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: CEIA-RL

Model Overview

CEIA-RL/qwen3-4b-dw-lr-dpo is a 4 billion parameter language model, building upon the cemig-temp/qwen3-4b-dw-lr base model. Its key differentiator lies in its training methodology: it has been fine-tuned using Online DPO (Direct Language Model Alignment from Online AI Feedback), a method introduced in the paper "Direct Language Model Alignment from Online AI Feedback" (arXiv:2402.04792). This approach aims to align the model's outputs more closely with desired human preferences through continuous feedback.

Key Capabilities

Online DPO Fine-tuning: Utilizes a novel training procedure for direct alignment based on online AI feedback.
Qwen3 Architecture: Benefits from the foundational capabilities of the Qwen3 model family.
Context Length: Supports a substantial context window of 32768 tokens, enabling processing of longer inputs and generating more coherent, extended responses.

Training Details

The model was trained using the TRL (Transformers Reinforcement Learning) library, specifically implementing the Online DPO method. This training process is designed to enhance the model's ability to generate aligned and preferred responses, making it suitable for applications where nuanced and human-like interaction is crucial.

Use Cases

This model is particularly well-suited for applications requiring:

Conversational AI: Generating more aligned and contextually appropriate dialogue.
Instruction Following: Producing outputs that better adhere to user instructions and preferences.
Research in Alignment: Exploring the effectiveness of Online DPO for language model fine-tuning.

Overview

Model Overview

Key Capabilities

Training Details

Use Cases

Full Model Card (README)