Name: kikiyaa/qwen-dpo-finetuned-ver2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: kikiyaa

Overview

The kikiyaa/qwen-dpo-finetuned-ver2 is a 7.6 billion parameter language model, building upon the base architecture of Qwen/Qwen2.5-7B. Developed by kikiyaa, this model has undergone further fine-tuning using the Direct Preference Optimization (DPO) method, as introduced in the paper "Direct Preference Optimization: Your Language Model is Secretly a Reward Model" (2305.18290). This training approach aims to align the model's outputs more closely with human preferences.

Key Capabilities

Preference-tuned Responses: Utilizes DPO for generating outputs that are aligned with specified preferences, potentially leading to more helpful and desirable text.
General Text Generation: Capable of various text generation tasks, leveraging its 7.6 billion parameters and a substantial context window of 32768 tokens.
TRL Framework: Trained using the TRL (Transformers Reinforcement Learning) library, indicating a robust and established training pipeline.

Training Details

The model's fine-tuning process specifically employed DPO, a technique that directly optimizes a language model to act as its own reward model. This method is known for its effectiveness in improving model alignment without requiring a separate reward model. The training was conducted using TRL version 1.1.0, with Transformers 5.5.4 and Pytorch 2.9.1+cu128.

Overview

Overview

Key Capabilities

Training Details

Full Model Card (README)