koguma-ai/sft-dpo-qwen-cot-merged0207_unsloth_03
The koguma-ai/sft-dpo-qwen-cot-merged0207_unsloth_03 is a 4 billion parameter Qwen3-based causal language model, fine-tuned by koguma-ai using a two-stage Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) pipeline. This model is specifically optimized for structured output generation and Chain-of-Thought (CoT) reasoning. It features full-merged 16-bit weights and supports a context length of 40960 tokens, making it suitable for tasks requiring detailed reasoning and structured responses.
Loading preview...
Model Overview
The koguma-ai/sft-dpo-qwen-cot-merged0207_unsloth_03 is a 4 billion parameter language model built upon the Qwen3 architecture. Developed by koguma-ai, this model undergoes a unique two-stage training process: Supervised Fine-Tuning (SFT) followed by Direct Preference Optimization (DPO), leveraging the Unsloth library. This approach aims to enhance the model's ability to generate structured outputs and perform Chain-of-Thought (CoT) reasoning.
Key Training Details
- SFT Stage: The base model was initially fine-tuned using the u-10bei/structured_data_with_cot_dataset_512_v2 dataset. This stage focused on teaching the model structured output generation and CoT reasoning, utilizing an assistant-only loss strategy with CoT masking.
- DPO Stage: After merging the SFT LoRA adapter, a new LoRA adapter was applied for DPO training. This stage used the u-10bei/dpo-dataset-qwen-cot dataset to further align the model's outputs with preferred responses.
Features and Usage
- Merged Weights: This repository provides the full-merged 16-bit weights, eliminating the need for adapter loading.
- Optimized for Reasoning: The two-stage fine-tuning process specifically targets improved structured output and Chain-of-Thought capabilities.
- Direct Use: The model can be directly loaded and used with the
transformerslibrary, as demonstrated in the provided Python example.
License
The model operates under the Apache 2.0 license, consistent with the terms of its base model.