Name: koguma-ai/sft-dpo-qwen-cot-merged0207_unsloth_03 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: koguma-ai

Model Overview

The koguma-ai/sft-dpo-qwen-cot-merged0207_unsloth_03 is a 4 billion parameter language model built upon the Qwen3 architecture. Developed by koguma-ai, this model undergoes a unique two-stage training process: Supervised Fine-Tuning (SFT) followed by Direct Preference Optimization (DPO), leveraging the Unsloth library. This approach aims to enhance the model's ability to generate structured outputs and perform Chain-of-Thought (CoT) reasoning.

Key Training Details

SFT Stage: The base model was initially fine-tuned using the u-10bei/structured_data_with_cot_dataset_512_v2 dataset. This stage focused on teaching the model structured output generation and CoT reasoning, utilizing an assistant-only loss strategy with CoT masking.
DPO Stage: After merging the SFT LoRA adapter, a new LoRA adapter was applied for DPO training. This stage used the u-10bei/dpo-dataset-qwen-cot dataset to further align the model's outputs with preferred responses.

Features and Usage

Merged Weights: This repository provides the full-merged 16-bit weights, eliminating the need for adapter loading.
Optimized for Reasoning: The two-stage fine-tuning process specifically targets improved structured output and Chain-of-Thought capabilities.
Direct Use: The model can be directly loaded and used with the transformers library, as demonstrated in the provided Python example.

License

The model operates under the Apache 2.0 license, consistent with the terms of its base model.

Overview

Model Overview

Key Training Details

Features and Usage

License

Full Model Card (README)