Name: sfutenma/dpo-qwen3_4b-cot-merged_v260302-112329 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: sfutenma

Model Overview

This model, sfutenma/dpo-qwen3_4b-cot-merged_v260302-112329, is a 4 billion parameter language model based on the Qwen3 architecture. It has been fine-tuned by sfutenma using Direct Preference Optimization (DPO) via the Unsloth library, building upon a lora_structeval_t_qwen3_4b base model. The primary objective of this DPO training was to align the model's responses with preferred outputs, specifically targeting improvements in reasoning (Chain-of-Thought) and the generation of structured responses.

Key Capabilities

Enhanced Reasoning: Optimized for Chain-of-Thought (CoT) prompting to improve logical processing.
Structured Output Quality: Fine-tuned to produce higher quality, well-formatted structured responses.
DPO Fine-tuning: Leverages Direct Preference Optimization for better alignment with desired output characteristics.
Full-Merged Weights: Provided as full-merged 16-bit weights, eliminating the need for adapter loading.

Training Details

The model underwent 5 epochs of DPO training with a learning rate of 5e-07 and a beta value of 0.1. It utilized a maximum sequence length of 768 tokens during training. The training data used was u-10bei/dpo-dataset-qwen-cot, and the model is released under the MIT License, with users required to comply with the original base model's license terms.

Overview

Model Overview

Key Capabilities

Training Details

Full Model Card (README)