modrill/math_think_11_qwen3_4b_base_task_arithmetic_scaling_0_5

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:May 20, 2026License:cc-by-nc-4.0Architecture:Transformer Open Weights Cold

The modrill/math_think_11_qwen3_4b_base_task_arithmetic_scaling_0_5 is a 4 billion parameter language model based on the Qwen3-4B-Base architecture. It is a result of a Task Arithmetic merge between a fine-tuned math_think_11 Qwen3-4B-Base SFT model and the base Qwen/Qwen3-4B-Base, using a scaling coefficient of 0.5. This model is specifically designed to leverage the strengths of both models for improved performance, particularly in areas where the SFT model excels.

Loading preview...

Overview

This model, modrill/math_think_11_qwen3_4b_base_task_arithmetic_scaling_0_5, is a 4 billion parameter language model built upon the Qwen3-4B-Base architecture. It was created using a technique called Task Arithmetic, which merges the weights of two models to combine their learned capabilities. Specifically, it merges a fine-tuned math_think_11 Qwen3-4B-Base SFT model with the original Qwen/Qwen3-4B-Base model.

Key Characteristics

  • Architecture: Qwen3-4B-Base, a 4 billion parameter model.
  • Merge Method: Utilizes Task Arithmetic, a method for combining model weights. The formula applied is theta = theta_base + scaling * (theta_sft - theta_base).
  • Source Models: Merges a Supervised Fine-Tuned (SFT) version of Qwen3-4B-Base (math_think_11_qwen3_4b_base_sft) with the foundational Qwen/Qwen3-4B-Base.
  • Scaling Coefficient: A scaling factor of 0.5 was applied during the merge process, indicating a balanced contribution from the SFT model's learned task-specific knowledge.

Potential Use Cases

This model is suitable for applications that can benefit from the combined strengths of a base Qwen3 model and a specialized SFT version. While the specific SFT task is not detailed, the merge suggests an intent to enhance performance in areas where the math_think_11 SFT model demonstrated proficiency, potentially related to mathematical reasoning or specific domain understanding.