Name: fumikawa/a25-v0006 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: fumikawa

Model Overview

fumikawa/a25-v0006 is a 4 billion parameter language model, fine-tuned from the Qwen/Qwen3-4B-Instruct-2507 base model. It leverages Direct Preference Optimization (DPO) via the Unsloth library to enhance its performance. This model is provided with full-merged 16-bit weights, eliminating the need for adapter loading.

Key Capabilities

Improved Reasoning: Optimized to enhance Chain-of-Thought (CoT) reasoning, allowing for more logical and step-by-step problem-solving.
Structured Response Quality: Fine-tuned to produce higher quality, more structured outputs based on preferred response patterns.
DPO Alignment: Benefits from DPO training, aligning its responses more closely with desired human preferences.

Training Details

The model was trained for 3 epochs with a learning rate of 1e-06 and a beta value of 0.1. The maximum sequence length used during training was 1024. The LoRA configuration (r=8, alpha=16) was merged into the base model. The training objective focused on aligning responses with preferred outputs, particularly for reasoning and structured answers, using the u-10bei/dpo-dataset-qwen-cot dataset.

Usage Considerations

This model is ready for direct use with the transformers library. Users should be aware that the model's license follows the MIT License, as per the dataset terms, and compliance with the original base model's license terms is also required.

Overview

Model Overview

Key Capabilities

Training Details

Usage Considerations

Full Model Card (README)