Name: qgallouedec/online-dpo-qwen2-4 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: qgallouedec

Model Overview

The qgallouedec/online-dpo-qwen2-4 is a 0.5 billion parameter language model, fine-tuned from the Qwen/Qwen2-0.5B-Instruct base model. It has been specifically trained using the Online DPO (Direct Language Model Alignment from Online AI Feedback) method, as introduced in the paper "Direct Language Model Alignment from Online AI Feedback". This training approach aims to improve the model's ability to follow instructions and generate aligned responses.

Key Capabilities

Instruction Following: Enhanced ability to understand and respond to user prompts based on its DPO training.
Conversational AI: Suitable for generating coherent and contextually relevant text in dialogue-based applications.
Compact Size: At 0.5 billion parameters, it offers a balance between performance and computational efficiency, making it viable for deployment in resource-constrained environments.

Training Details

The model was fine-tuned on the trl-lib/ultrafeedback-prompt dataset using the TRL library. The training utilized specific versions of frameworks including TRL 0.12.0.dev0, Transformers 4.45.0.dev0, Pytorch 2.4.1, Datasets 3.0.0, and Tokenizers 0.19.1. The training process can be visualized via Weights & Biases here.

Good For

Developing chatbots and virtual assistants.
Generating responses for instruction-based tasks.
Applications requiring a smaller, efficient language model with improved alignment.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)