Name: amu870/PiG-v0-dpo API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: amu870

Overview

amu870/PiG-v0-dpo is a 4 billion parameter language model, fine-tuned from the Qwen/Qwen3-4B-Instruct-2507 base model. It leverages Direct Preference Optimization (DPO), implemented with the Unsloth library, to align its responses with preferred outputs. This model is provided with full-merged 16-bit weights, eliminating the need for adapter loading.

Key Capabilities

Enhanced Reasoning: Optimized to improve Chain-of-Thought reasoning, leading to more structured and logical outputs.
Improved Response Quality: Fine-tuned using a preference dataset to enhance the overall quality and alignment of generated responses.
Direct Preference Optimization (DPO): Utilizes DPO for effective alignment, focusing on preferred output patterns.

Training Details

The model was trained for 1 epoch with a learning rate of 1e-07 and a beta value of 0.1. It used a maximum sequence length of 1024 during training. The training data source is the u-10bei/dpo-dataset-qwen-cot dataset.

Usage

As a merged model, amu870/PiG-v0-dpo can be directly used with the transformers library for inference, simplifying deployment. Users should adhere to the MIT License of the training data and the original base model's license terms.

Overview

Overview

Key Capabilities

Training Details

Usage

Full Model Card (README)