quangdung/Qwen2.5-7B-Math-Distill-Sens

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 22, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Qwen2.5-7B-Math-Distill-Sens is a 7.6 billion parameter language model created by quangdung, merging DeepSeek-R1-Distill-Qwen-7B and Qwen2.5-Math-7B using Sensitivity-aware Model Merging. This model is specifically optimized for mathematical reasoning tasks, aiming to significantly reduce output verbosity and inference cost while maintaining strong accuracy. It achieves an average accuracy of 66.9% on mathematical benchmarks with an average output token count of 701, representing a 75.2% reduction in output length compared to its base reasoning model.

Loading preview...

Model Overview

quangdung/Qwen2.5-7B-Math-Distill-Sens is a 7.6 billion parameter model developed by quangdung, resulting from the application of Sensitivity-aware Model Merging (Sens Merging) to two base models: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B and Qwen/Qwen2.5-Math-7B. The primary goal of this merge is to create a model that retains the robust mathematical reasoning capabilities of DeepSeek-R1-Distill while drastically reducing the verbosity and token length of its outputs, thereby lowering inference costs.

Key Capabilities & Performance

  • Optimized Mathematical Reasoning: Achieves an average accuracy of 66.9% across various mathematical benchmarks, including College Math, GSM8K, MATH, Minerva Math, and OlympiadBench.
  • Reduced Output Verbosity: Produces significantly shorter outputs, with an average of 701 tokens per response. This represents a 75.2% reduction in output tokens compared to the DeepSeek-R1-Distill-Qwen-7B base model.
  • Cost-Effective Inference: The substantial reduction in output length directly translates to lower inference costs without requiring additional gradient-based fine-tuning.
  • Competitive Accuracy: Maintains strong reasoning performance, with only a 2.5-point average accuracy drop compared to the more verbose DeepSeek-R1-Distill-Qwen-7B.

When to Use This Model

This model is ideal for applications requiring accurate mathematical problem-solving where inference cost and output length are critical considerations. It offers an attractive trade-off between reasoning quality and efficiency, making it suitable for scenarios where concise, yet correct, mathematical explanations are preferred over lengthy chains of thought.