Name: jinkami07/dpo-qwen3-4b-r8-lr1e6-beta005-ep2-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jinkami07

Model Overview

This model, jinkami07/dpo-qwen3-4b-r8-lr1e6-beta005-ep2-merged, is a 4 billion parameter language model based on the Qwen3-4B-Instruct-2507 architecture. It has been fine-tuned using Direct Preference Optimization (DPO) via the Unsloth library, with its LoRA adapters (r=16, alpha=32) fully merged into the base model for direct use without additional adapter loading.

Key Capabilities

Enhanced Reasoning: Optimized to improve Chain-of-Thought (CoT) reasoning, allowing for more logical and step-by-step problem-solving.
Structured Response Quality: Fine-tuned to produce higher quality, more structured outputs based on preference datasets.
DPO Alignment: Leverages DPO to align model responses with preferred human outputs, leading to more desirable and coherent generations.

Training Details

The model underwent 1 epoch of DPO training with a learning rate of 1e-06 and a beta value of 0.1. It utilized a maximum sequence length of 1024 tokens during training. The training data used was u-10bei/dpo-dataset-qwen-cot, which focuses on preference-based optimization.

Good For

Applications requiring improved logical reasoning and structured output generation.
Tasks where response alignment with human preferences is critical.
Developers seeking a Qwen3-based model with enhanced CoT capabilities.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)