Name: sfutenma/dpo-qwen3_4b-cot-merged_v260227-161515 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: sfutenma

Model Overview

This model, sfutenma/dpo-qwen3_4b-cot-merged_v260227-161515, is a 4 billion parameter Qwen3-based language model developed by sfutenma. It has been fine-tuned using Direct Preference Optimization (DPO) via the Unsloth library, building upon the base model sfutenma/lora_structeval_t_qwen3_4b_v260221-161528. The fine-tuning process focused on aligning the model's responses with preferred outputs, specifically targeting improvements in reasoning (Chain-of-Thought) and the quality of structured responses.

Key Capabilities

Enhanced Reasoning: Optimized for Chain-of-Thought (CoT) prompting to improve logical deduction and problem-solving.
Structured Response Generation: Designed to produce high-quality, well-formatted structured outputs based on preference data.
DPO Fine-tuning: Leverages Direct Preference Optimization for better alignment with desired response characteristics.
Merged Weights: Provides full-merged 16-bit weights, eliminating the need for adapter loading and simplifying deployment.

Training Details

The model was trained for 5 epochs with a learning rate of 2e-07 and a beta value of 0.03, using a maximum sequence length of 768 tokens. The training data utilized was u-10bei/dpo-dataset-qwen-cot. The base model's LoRA configuration (r=8, alpha=16) was merged during the process.

Ideal Use Cases

This model is particularly well-suited for applications where precise reasoning, logical coherence, and structured output are critical. Developers can integrate it directly using the transformers library for tasks requiring advanced conversational AI with a focus on structured and reasoned responses.

Overview

Model Overview

Key Capabilities

Training Details

Ideal Use Cases

Full Model Card (README)