Name: takeshi200ok/qwen3-4B-dpo-anti-fence-600 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: takeshi200ok

Model Overview

This model, takeshi200ok/qwen3-4B-dpo-anti-fence-600, is a 4 billion parameter language model based on the Qwen3-4B-Instruct-2507 architecture. It has been fine-tuned by takeshi200ok using Direct Preference Optimization (DPO) via the Unsloth library, starting from an SFT LoRA adapter. The final artifact is a fully merged 16-bit model, meaning no adapter loading is required for use with transformers.

Key Capabilities

Enhanced Reasoning: Optimized to improve Chain-of-Thought reasoning processes.
Structured Responses: Focuses on generating higher quality and more structured outputs.
Preference Alignment: Aligned with preferred outputs through DPO training on a specific preference dataset.

Training Details

The model underwent 1 epoch of DPO training with a learning rate of 1e-05 and a beta of 0.1. It utilized a maximum sequence length of 3072. The training data used was u-10bei/dpo-dataset-qwen-cot.

Usage & License

As a merged model, it can be directly loaded and used with the transformers library. The model is released under the MIT License, aligning with the terms of its training dataset. Users must also comply with the original base model's license terms.