modrill/math_think_11_qwen3_4b_base_task_arithmetic_scaling_0_6

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:May 20, 2026License:cc-by-nc-4.0Architecture:Transformer Open Weights Warm

The modrill/math_think_11_qwen3_4b_base_task_arithmetic_scaling_0_6 is a 4 billion parameter language model based on the Qwen3-4B-Base architecture. It is a task arithmetic merge, specifically designed to enhance mathematical reasoning capabilities by combining a fine-tuned Qwen3-4B-Base model with its original base version. This model is optimized for arithmetic tasks, leveraging a scaling coefficient of 0.6 to balance the influence of the fine-tuned and base models.

Loading preview...

Overview

This model, modrill/math_think_11_qwen3_4b_base_task_arithmetic_scaling_0_6, is a 4 billion parameter language model built upon the Qwen3-4B-Base architecture. It is created using a task arithmetic merge method, which combines two models to achieve specific performance characteristics. The merge specifically targets enhancing mathematical reasoning.

Key Capabilities

  • Mathematical Reasoning: The model is a result of merging a fine-tuned Qwen3-4B-Base model (specifically math_think_11_qwen3_4b_base_sft) with the original Qwen/Qwen3-4B-Base.
  • Task Arithmetic: It utilizes the task arithmetic formula theta = theta_base + scaling * (theta_sft - theta_base) with a scaling coefficient of 0.6 to integrate the specialized mathematical capabilities.

Good For

  • Arithmetic Tasks: This model is particularly suited for applications requiring improved performance on arithmetic and mathematical reasoning problems, due to its specialized merging approach.
  • Leveraging Qwen3-4B-Base: Users already familiar with or utilizing the Qwen3-4B-Base architecture can benefit from this version's enhanced mathematical focus.