Name: oretti/dpo-qwen-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: oretti

Model Overview

The oretti/dpo-qwen-merged model is a 4 billion parameter language model derived from the Qwen/Qwen3-4B-Instruct-2507 base model. It has undergone fine-tuning using Direct Preference Optimization (DPO) via the Unsloth library, with its full 16-bit weights merged for direct use without adapter loading.

Key Capabilities

Enhanced Reasoning: Optimized through DPO to improve Chain-of-Thought (CoT) reasoning, enabling more structured and logical responses.
Improved Output Quality: Focuses on aligning responses with preferred outputs, leading to higher quality and more coherent text generation.
Direct Integration: Provided as a full-merged model, simplifying deployment with transformers as no LoRA adapter loading is required.

Training Details

The model was trained for 1 epoch with a learning rate of 1e-07 and a beta value of 0.3, using a maximum sequence length of 1024. The training utilized the u-10bei/dpo-dataset-qwen-cot preference dataset, which is designed to improve reasoning and structured response generation.

Ideal Use Cases

This model is particularly well-suited for applications requiring:

Complex Reasoning Tasks: Where structured and logical thought processes are beneficial.
High-Quality Text Generation: For scenarios demanding aligned and coherent outputs.
Instruction Following: Benefiting from the DPO fine-tuning for better adherence to preferred response styles.

Overview

Model Overview

Key Capabilities

Training Details

Ideal Use Cases

Full Model Card (README)