modrill/math_think_11_qwen3_4b_base_task_arithmetic_scaling_0_1

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:May 20, 2026License:cc-by-nc-4.0Architecture:Transformer Open Weights Warm

The modrill/math_think_11_qwen3_4b_base_task_arithmetic_scaling_0_1 is a 4 billion parameter language model derived from Qwen3-4B-Base. It is created by modrill using a Task Arithmetic merge method, combining a fine-tuned Qwen3-4B-Base SFT model with the original Qwen3-4B-Base. This model is specifically designed to leverage the strengths of both base and SFT models for improved performance, particularly in mathematical reasoning tasks.

Loading preview...

Model Overview

The modrill/math_think_11_qwen3_4b_base_task_arithmetic_scaling_0_1 is a 4 billion parameter language model built upon the Qwen3-4B-Base architecture. It was developed by modrill using a technique called Task Arithmetic.

Key Characteristics

  • Architecture: Based on the Qwen3-4B-Base model.
  • Parameter Count: 4 billion parameters.
  • Context Length: Supports a context length of 32,768 tokens.
  • Merge Method: Utilizes Task Arithmetic, which combines a fine-tuned version of Qwen3-4B-Base (specifically math_think_11_qwen3_4b_base_sft) with the original Qwen/Qwen3-4B-Base.
  • Scaling Coefficient: The merge applies a scaling coefficient of 0.1 to the difference between the SFT model and the base model, indicating a subtle integration of the fine-tuned characteristics.

Intended Use

This model is designed for applications where a blend of the base model's general capabilities and the specialized knowledge from the fine-tuned SFT model is beneficial. The Task Arithmetic method aims to enhance performance in specific domains, likely mathematical reasoning given the math_think_11 prefix in the SFT model's name, without completely overwriting the base model's strengths.