bknyaz/Qwen3-0.6B-Math

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kPublished:Jan 29, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The bknyaz/Qwen3-0.6B-Math is an 0.8 billion parameter language model, fine-tuned from Qwen/Qwen3-0.6B specifically on the GSM8K mathematical reasoning dataset. This model is optimized for solving grade school math problems, demonstrating significantly improved performance on such tasks compared to its base model. With a context length of 32768 tokens, its primary use case is enhancing mathematical problem-solving capabilities in resource-efficient applications.

Loading preview...

Qwen3-0.6B-Math Overview

Qwen3-0.6B-Math is an 0.8 billion parameter language model developed by bknyaz, derived from the base Qwen/Qwen3-0.6B model. It has been specifically fine-tuned on the GSM8K (Grade School Math 8K) dataset, which focuses on mathematical reasoning problems. This targeted fine-tuning significantly enhances its ability to tackle arithmetic and logical math challenges.

Key Capabilities

  • Enhanced Mathematical Reasoning: Demonstrates improved performance on grade school level mathematical word problems.
  • Fine-tuned for Specific Tasks: Optimized for numerical and logical problem-solving, making it suitable for applications requiring precise mathematical outputs.
  • Resource-Efficient: As an 0.8B parameter model, it offers a balance of capability and computational efficiency.

Performance

Evaluation on the GSM8K test split shows a substantial improvement over its base model:

  • Qwen3-0.6B (Base): 21.0
  • Qwen3-0.6B-Math (Fine-tuned): 46.3

This indicates that the fine-tuning process effectively specialized the model for mathematical tasks.

Good for

  • Applications requiring accurate mathematical problem-solving at a foundational level.
  • Educational tools or platforms that need to generate or verify solutions to math problems.
  • Scenarios where a smaller, specialized model is preferred for efficiency without sacrificing mathematical accuracy.
  • As a baseline for further research into meta-merging or fine-tuning techniques for mathematical reasoning, as described in the associated blog post.