modrill/math_think_11_qwen3_4b_base_task_arithmetic_scaling_0_2
The modrill/math_think_11_qwen3_4b_base_task_arithmetic_scaling_0_2 is a 4 billion parameter language model based on the Qwen3-4B-Base architecture, created by modrill. This model is a result of a Task Arithmetic merge between a fine-tuned math_think_11 Qwen3-4B-Base SFT model and the base Qwen/Qwen3-4B-Base, using a scaling coefficient of 0.2. It is specifically designed to leverage the strengths of both models for improved performance, particularly in areas where the SFT model excels.
Loading preview...
Overview
The modrill/math_think_11_qwen3_4b_base_task_arithmetic_scaling_0_2 is a 4 billion parameter language model built upon the Qwen3-4B-Base architecture. This model is generated through a Task Arithmetic merging technique, combining a fine-tuned math_think_11_qwen3_4b_base_sft model with the original Qwen/Qwen3-4B-Base model. The merge applies a scaling coefficient of 0.2 to the difference between the SFT model's parameters and the base model's parameters, effectively blending their characteristics.
Key Capabilities
- Parameter Blending: Utilizes Task Arithmetic to combine the learned features of a specialized SFT model with a robust base model.
- Qwen3-4B-Base Foundation: Benefits from the underlying capabilities and architecture of the Qwen3-4B-Base model.
- Configurable Scaling: The
0.2scaling coefficient indicates a controlled integration of the SFT model's specific fine-tuning into the base model.
Good For
- Use cases requiring a model that integrates specific fine-tuning (from
math_think_11_qwen3_4b_base_sft) while retaining the general capabilities of theQwen3-4B-Base. - Exploring the effects of Task Arithmetic with different scaling coefficients on model performance.
- Applications where a balance between specialized knowledge and general language understanding is desired, derived from the merged models.