Name: koguma-ai/sft-dpo-qwen-cot-merged0207 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: koguma-ai

Overview

The koguma-ai/sft-dpo-qwen-cot-merged0207 is a 4 billion parameter language model built upon the Qwen3-4B-Instruct-2507 architecture. Developed by koguma-ai, this model undergoes a specialized two-stage training pipeline to enhance its reasoning and output capabilities: Supervised Fine-Tuning (SFT) followed by Direct Preference Optimization (DPO).

Key Capabilities

Structured Output Generation: The SFT stage specifically trains the model to produce outputs in a structured format.
Chain-of-Thought (CoT) Reasoning: Fine-tuned to generate step-by-step reasoning, improving the transparency and accuracy of its responses.
Preference Alignment: DPO training further refines the model's outputs based on preferred responses, leading to more aligned and high-quality generations.
Direct Usage: Provided as full-merged 16-bit weights, eliminating the need for adapter loading and simplifying deployment with the transformers library.

Training Details

The model's training involved:

SFT Stage: Utilized the u-10bei/structured_data_with_cot_dataset_512_v2 dataset with a LoRA configuration (r=64, alpha=128) and an assistant-only loss strategy with CoT masking.
DPO Stage: Applied a new LoRA adapter (r=8, alpha=16) and trained on the u-10bei/dpo-dataset-qwen-cot dataset to align with preferred outputs.

Good For

Applications requiring structured data extraction or generation.
Tasks benefiting from explicit reasoning steps (Chain-of-Thought).
Scenarios where high-quality, preference-aligned responses are crucial.

Overview

Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)