modrill/math_think_11_qwen3_4b_base_task_arithmetic_scaling_0_3

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:May 20, 2026License:cc-by-nc-4.0Architecture:Transformer Open Weights Warm

The modrill/math_think_11_qwen3_4b_base_task_arithmetic_scaling_0_3 model is a 4 billion parameter language model based on the Qwen3-4B-Base architecture. It is a result of a Task Arithmetic merge between a fine-tuned math_think_11 Qwen3-4B-Base SFT model and the original Qwen/Qwen3-4B-Base, using a scaling coefficient of 0.3. This model is specifically designed to leverage the strengths of both base and fine-tuned versions for improved performance in its target domain.

Loading preview...

Model Overview

The modrill/math_think_11_qwen3_4b_base_task_arithmetic_scaling_0_3 is a 4 billion parameter language model built upon the Qwen3-4B-Base architecture. This model is created using a Task Arithmetic merge technique, combining two distinct versions of the Qwen3-4B-Base model.

Key Characteristics

  • Architecture: Based on the Qwen3-4B-Base model.
  • Parameter Count: 4 billion parameters.
  • Merge Method: Utilizes Task Arithmetic, a technique that combines the weights of a base model and a fine-tuned model.
  • Components: It merges a specialized math_think_11 Qwen3-4B-Base SFT (Supervised Fine-Tuned) model with the general Qwen/Qwen3-4B-Base.
  • Scaling Coefficient: The merge applies a scaling coefficient of 0.3, indicating the contribution of the fine-tuned model's learned task-specific knowledge.

Use Case

This model is particularly suited for scenarios where a blend of general language understanding (from the base model) and specific task-oriented capabilities (from the fine-tuned model) is beneficial. The Task Arithmetic approach allows for a controlled integration of specialized knowledge without fully overwriting the base model's broader capabilities.