Name: Yurori/qwen3-4b-dpo-qwen-cot-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Yurori

Model Overview

Yurori/qwen3-4b-dpo-qwen-cot-merged is a 4 billion parameter language model derived from the Qwen3-4B-Instruct-2507 base model. It has been fine-tuned using Direct Preference Optimization (DPO), a method designed to align model outputs with human preferences more effectively. The fine-tuning process utilized the Unsloth library, known for efficient training.

Key Characteristics

Base Model: Qwen/Qwen3-4B-Instruct-2507, a robust foundation for instruction-following tasks.
Optimization Method: Direct Preference Optimization (DPO), enhancing the model's ability to generate preferred responses.
Weights: Contains full-merged 16-bit weights, meaning no separate adapter loading is required for deployment, simplifying integration.
Training Configuration: Fine-tuned for 1 epoch with a learning rate of 1e-07 and a beta value of 0.1, using a maximum sequence length of 1024 tokens.

Ideal Use Cases

This model is particularly well-suited for applications where:

Preference Alignment: Generating responses that closely match desired human preferences or specific output styles is critical.
Instruction Following: Improved adherence to complex instructions due to DPO fine-tuning.
Efficient Deployment: The merged 16-bit weights offer straightforward integration without additional adapter management.

Overview

Model Overview

Key Characteristics

Ideal Use Cases

Full Model Card (README)