Name: sfutenma/dpo-qwen3_4b-cot-merged_v260302-093614 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: sfutenma

Model Overview

The sfutenma/dpo-qwen3_4b-cot-merged_v260302-093614 is a 4 billion parameter language model based on the Qwen3 architecture. It has been fine-tuned using Direct Preference Optimization (DPO) via the Unsloth library, building upon the sfutenma/lora_structeval_t_qwen3_4b_v260228-172650 model. This release provides the full-merged 16-bit weights, eliminating the need for adapter loading.

Key Capabilities

Enhanced Reasoning: Optimized through DPO to improve Chain-of-Thought (CoT) reasoning abilities.
Structured Response Quality: Specifically aligned to produce higher quality, structured outputs based on a preference dataset.
Efficient Deployment: Provided as a fully merged model, ready for direct use with transformers without additional configuration.

Training Details

The model was trained for 5 epochs with a learning rate of 1e-06 and a beta of 0.1. It utilized a maximum sequence length of 768 tokens during DPO training. The base model for this fine-tuning was unsloth/Qwen3-4B-Instruct-2507. The training data used was u-10bei/dpo-dataset-qwen-cot.

Usage Considerations

This model is ideal for tasks where improved reasoning and structured, aligned responses are critical. Users should be aware that the model's license follows the MIT License, as per the dataset terms, and compliance with the original base model's license terms is also required.

Overview

Model Overview

Key Capabilities

Training Details

Usage Considerations

Full Model Card (README)