Name: XKilin/DPO_v1_20260207 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: XKilin

Model Overview

XKilin/DPO_v1_20260207 is a 4 billion parameter language model developed by XKilin, based on the Qwen3-4B-Instruct-2507 architecture. This model has undergone fine-tuning using Direct Preference Optimization (DPO) via the Unsloth library, resulting in a full-merged 16-bit weight model that requires no adapter loading.

Key Capabilities

Enhanced Reasoning: Optimized through DPO to improve Chain-of-Thought reasoning abilities.
Structured Response Quality: Aligned to produce higher quality and more structured outputs based on preference datasets.
Direct Use: Provided as a merged model, ready for direct integration with transformers.

Training Details

The model was trained for 2 epochs with a learning rate of 1e-04 and a beta of 0.1. It utilized a maximum sequence length of 1024 during DPO training. The training data included the [u-10bei/dpo-dataset-qwen-cot] dataset. The model is released under the MIT License, with users also required to comply with the original base model's license terms.

Overview

Model Overview

Key Capabilities

Training Details

Full Model Card (README)