fumikawa/a25-v0006

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 6, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

fumikawa/a25-v0006 is a 4 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using Direct Preference Optimization (DPO). This model is specifically optimized to improve reasoning capabilities, particularly Chain-of-Thought, and enhance structured response quality. It is designed for tasks requiring coherent logical progression and well-formatted outputs.

Loading preview...

Model Overview

fumikawa/a25-v0006 is a 4 billion parameter language model, fine-tuned from the Qwen/Qwen3-4B-Instruct-2507 base model. It leverages Direct Preference Optimization (DPO) via the Unsloth library to enhance its performance. This model is provided with full-merged 16-bit weights, eliminating the need for adapter loading.

Key Capabilities

  • Improved Reasoning: Optimized to enhance Chain-of-Thought (CoT) reasoning, allowing for more logical and step-by-step problem-solving.
  • Structured Response Quality: Fine-tuned to produce higher quality, more structured outputs based on preferred response patterns.
  • DPO Alignment: Benefits from DPO training, aligning its responses more closely with desired human preferences.

Training Details

The model was trained for 3 epochs with a learning rate of 1e-06 and a beta value of 0.1. The maximum sequence length used during training was 1024. The LoRA configuration (r=8, alpha=16) was merged into the base model. The training objective focused on aligning responses with preferred outputs, particularly for reasoning and structured answers, using the u-10bei/dpo-dataset-qwen-cot dataset.

Usage Considerations

This model is ready for direct use with the transformers library. Users should be aware that the model's license follows the MIT License, as per the dataset terms, and compliance with the original base model's license terms is also required.