fumikawa/a25-v0005

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 6, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The fumikawa/a25-v0005 model is a 4 billion parameter instruction-tuned causal language model, fine-tuned by fumikawa from Qwen/Qwen3-4B-Instruct-2507. It utilizes Direct Preference Optimization (DPO) to enhance reasoning capabilities, specifically Chain-of-Thought, and improve structured response quality. With a context length of 40960 tokens, this model is optimized for tasks requiring logical deduction and coherent, well-structured outputs. It is suitable for applications where precise and reasoned responses are critical.

Loading preview...

Model Overview

fumikawa/a25-v0005 is a 4 billion parameter language model, fine-tuned by fumikawa from the Qwen/Qwen3-4B-Instruct-2507 base model. This model leverages Direct Preference Optimization (DPO), implemented via the Unsloth library, to align its responses with preferred outputs. It is provided as full-merged 16-bit weights, eliminating the need for adapter loading.

Key Optimizations

The primary objective of this fine-tuning was to significantly improve the model's reasoning capabilities, particularly in generating Chain-of-Thought explanations, and to enhance the overall structured response quality. This was achieved by training on a specific preference dataset (u-10bei/dpo-dataset-qwen-cot) for 1 epoch with a learning rate of 1e-07 and a beta value of 0.1.

Technical Specifications

  • Base Model: Qwen/Qwen3-4B-Instruct-2507
  • Fine-tuning Method: DPO
  • Parameter Count: 4 Billion
  • Max Sequence Length: 1024 (during DPO training)
  • Context Length: 40960 tokens (inherited from base model)

Recommended Use Cases

This model is particularly well-suited for applications that require:

  • Enhanced Reasoning: Generating logical, step-by-step explanations (Chain-of-Thought).
  • Structured Output: Producing responses that adhere to specific formats or structures.
  • Instruction Following: Executing complex instructions with improved accuracy and coherence.

Licensing

The model is released under the MIT License, consistent with the terms of its training data. Users must also comply with the original base model's license terms.