Name: Diocletianus/dpo-qwen-cot-merged0207 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Diocletianus

Model Overview

Diocletianus/dpo-qwen-cot-merged0207 is a 4 billion parameter language model derived from the Qwen3-4B-Instruct-2507 base model. It has been fine-tuned using Direct Preference Optimization (DPO) with the Unsloth library, resulting in a merged 16-bit weight model that requires no adapter loading.

Key Capabilities

Enhanced Reasoning: Optimized through DPO to improve Chain-of-Thought (CoT) reasoning, enabling more structured and logical problem-solving.
Aligned Responses: Fine-tuned to align its outputs with preferred response patterns, leading to higher quality and more relevant generations.
Structured Output: Focuses on improving the quality of structured responses, making it suitable for tasks requiring specific formats or coherent arguments.

Training Details

The model underwent 1 epoch of DPO training with a learning rate of 1e-07 and a beta value of 0.1. The maximum sequence length used during training was 1024 tokens. The training utilized the u-10bei/dpo-dataset-qwen-cot dataset. The LoRA configuration (r=8, alpha=16) was merged directly into the base model.

Usage

This merged model can be directly integrated and used with the transformers library for inference, providing a straightforward deployment experience. Users should adhere to the MIT License of the training data and the original base model's license terms.

Overview

Model Overview

Key Capabilities

Training Details

Usage

Full Model Card (README)