Name: kenzrx/dpo-ori-qwen-cot-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: kenzrx

Model Overview

The kenzrx/dpo-ori-qwen-cot-merged model is a 4 billion parameter language model built upon the Qwen3-4B-Instruct-2507 base. It has undergone a two-stage fine-tuning process to enhance its response quality and alignment.

Training Stages

Supervised Fine-Tuning (SFT): Initially, the model was fine-tuned using the structured_data_with_cot_dataset_v2 to learn high-quality reference answers and specific formatting requirements.
Direct Preference Optimization (DPO): Following SFT, the model was further optimized using DPO, leveraging the same structured_data_with_cot_dataset_v2 as a preference dataset. This stage specifically trains the model to prefer "chosen" outputs over "rejected" outputs for a given prompt, significantly improving response alignment and structured quality.

Key Characteristics

Full-merged 16-bit weights: No adapter loading is required, simplifying deployment.
DPO Alignment: Optimized to produce responses that are aligned with preferred examples, making it suitable for tasks where output structure and quality are critical.
Lineage: Derived from Qwen/Qwen3-4B-Instruct-2507, with an intermediate SFT stage (kenzrx/qwen3-4b-sft-merged).

Usage

This model is designed for direct use with the transformers library, supporting standard causal language model inference workflows. Its DPO training makes it particularly effective for generating structured and high-quality text outputs.

Overview

Model Overview

Training Stages

Key Characteristics

Usage

Full Model Card (README)