Name: Chattso-GPT/dpo-qwen-cot-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Chattso-GPT

Overview

This model, Chattso-GPT/dpo-qwen-cot-merged, is a 4 billion parameter language model based on the Qwen/Qwen3-4B-Instruct-2507 architecture. It has undergone Direct Preference Optimization (DPO) using the Unsloth library, resulting in a full-merged 16-bit weight model that requires no adapter loading.

Key Capabilities

Enhanced Reasoning: Optimized through DPO to improve Chain-of-Thought (CoT) reasoning, making it more effective for complex logical tasks.
Structured Responses: Fine-tuned to align its outputs with preferred formats, leading to higher quality structured responses.
Efficient Deployment: Provided as a fully merged model, simplifying deployment with standard transformers library usage.

Training Details

The model was trained for 1 epoch with a learning rate of 1e-07 and a beta of 0.1, using a maximum sequence length of 1024. The training utilized the u-10bei/dpo-dataset-qwen-cot dataset, which focuses on preference alignment for reasoning and structured output. The model's license is MIT, with compliance required for the original base model's terms.

Overview

Overview

Key Capabilities

Training Details

Full Model Card (README)