Name: kabuizuchi-trading/dpo-qwen-cot-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: kabuizuchi-trading

Model Overview

kabuizuchi-trading/dpo-qwen-cot-merged is a fine-tuned version of the kabuizuchi-trading/qwen3-4b-lora-structured base model. It leverages Direct Preference Optimization (DPO), implemented with the Unsloth library, to align its responses with preferred outputs. This model specifically targets improvements in reasoning (Chain-of-Thought) and the generation of structured responses.

Key Features & Training Details

Optimization Method: Direct Preference Optimization (DPO).
Base Model: kabuizuchi-trading/qwen3-4b-lora-structured.
Training Objective: Enhance reasoning capabilities and structured output quality based on a preference dataset (u-10bei/dpo-dataset-qwen-cot).
Merged Weights: This repository provides the full-merged 16-bit weights, eliminating the need for adapter loading.
Training Configuration: Trained for 1 epoch with a learning rate of 3e-07, beta of 0.08, and a maximum sequence length of 2048. LoRA configuration (r=8, alpha=16) was merged into the base model.

Usage

This model can be used directly with the transformers library for inference, as it contains merged weights. It is suitable for tasks requiring improved reasoning and structured output generation.

License

The model and its training data are distributed under the MIT License. Users must also adhere to the original base model's license terms.

Overview

Model Overview

Key Features & Training Details

Usage

License

Full Model Card (README)