Name: Chiaki111/dpo-qwen-cot-merged_dpo_v1_l2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Chiaki111

Model Overview

Chiaki111/dpo-qwen-cot-merged_dpo_v1_l2 is a 4 billion parameter language model derived from Qwen/Qwen3-4B-Instruct-2507. This model has undergone Direct Preference Optimization (DPO), a fine-tuning technique that aligns the model's outputs more closely with human preferences, utilizing the Unsloth library for efficient training.

Key Characteristics

Base Model: Qwen/Qwen3-4B-Instruct-2507
Fine-tuning Method: Direct Preference Optimization (DPO)
Parameter Count: 4 billion parameters
Context Length: 40960 tokens (inherited from base model)
Weight Format: Full-merged 16-bit weights, which means no adapter loading is required for deployment.

Training Details

The DPO fine-tuning was conducted over 1 epoch with a learning rate of 1e-06 and a beta value of 0.1. The maximum sequence length used during training was 1024 tokens.

Intended Use

This model is suitable for applications where a DPO-tuned Qwen3-4B variant is desired, particularly for tasks that benefit from preference-based alignment. Its full-merged weights simplify deployment by removing the need for separate adapter management.

Overview

Model Overview

Key Characteristics

Training Details

Intended Use

Full Model Card (README)