Name: mutsumutsu/dpo-qwen-cot-merged-260205-tokenchg2024-1024 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: mutsumutsu

Model Overview

This model, mutsumutsu/dpo-qwen-cot-merged-260205-tokenchg2024-1024, is a 4 billion parameter language model derived from Qwen/Qwen3-4B-Instruct-2507. It has undergone Direct Preference Optimization (DPO) using the Unsloth library, resulting in a full-merged 16-bit weight model that requires no adapter loading.

Key Optimizations

The primary objective of this fine-tuning was to align the model's responses with preferred outputs, with a specific focus on:

Enhanced Reasoning: Improving Chain-of-Thought (CoT) capabilities.
Structured Response Quality: Generating more coherent and well-formed outputs based on a preference dataset.

Training Configuration

Base Model: Qwen/Qwen3-4B-Instruct-2507
Method: DPO (Direct Preference Optimization)
Epochs: 1
Learning Rate: 1e-07
Max Sequence Length: 2048

Intended Use Cases

This model is particularly well-suited for applications where:

Logical Reasoning is critical, benefiting from its CoT optimization.
High-Quality, Structured Outputs are required, such as in question-answering, summarization, or content generation tasks demanding clear organization.

Licensing

The model is distributed under the MIT License, consistent with its training data. Users must also adhere to the original base model's license terms.

Overview

Model Overview

Key Optimizations

Training Configuration

Intended Use Cases

Licensing

Full Model Card (README)