Name: KS150/testDPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: KS150

Model Overview

KS150/testDPO is a 4 billion parameter language model built upon the Qwen3-4B-Instruct-2507 base model. It has been fine-tuned using Direct Preference Optimization (DPO) via the Unsloth library to align its responses with preferred outputs. This model provides full-merged 16-bit weights, eliminating the need for adapter loading.

Key Capabilities

Enhanced Reasoning: Optimized to improve Chain-of-Thought (CoT) reasoning capabilities.
Structured Response Quality: Focuses on delivering higher quality and more structured outputs.
Direct Preference Optimization: Utilizes DPO for better alignment with desired response patterns.

Training Details

The model underwent 3 epochs of DPO training with a learning rate of 7e-04 and a beta value of 0.1. The training utilized a maximum sequence length of 256 and incorporated LoRA configuration (r=8, alpha=16) which has been merged into the base model. The training data used is u-10bei/dpo-dataset-qwen-cot.

Usage

As a merged model, KS150/testDPO can be directly used with the transformers library for inference, supporting torch.float16 and device_map="auto" for efficient deployment. The model is released under the MIT License, with users also required to comply with the original base model's license terms.

Overview

Model Overview

Key Capabilities

Training Details

Usage

Full Model Card (README)