sumith2425/model_sft_resta

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Mar 18, 2026Architecture:Transformer Cold

sumith2425/model_sft_resta is a 1.5 billion parameter language model created by sumith2425 using the Task Arithmetic merge method. It is based on Qwen/Qwen2.5-1.5B-Instruct and combines two merged models: 'harmful_merged_model' and 'sft_merged_model'. This model is designed for specific applications derived from its merged components, offering a 32768 token context length.

Loading preview...

Model Overview

sumith2425/model_sft_resta is a 1.5 billion parameter language model developed by sumith2425. It was created using the Task Arithmetic merge method, leveraging Qwen/Qwen2.5-1.5B-Instruct as its base model. This merging technique allows for the combination of specific characteristics from different pre-trained models.

Merge Details

This model integrates two distinct components:

  • harmful_merged_model: Included with a negative weight in the Task Arithmetic configuration, suggesting an intent to mitigate or invert certain characteristics from this component.
  • sft_merged_model: Included with a positive weight, indicating its features are intended to be enhanced or preserved.

The merging process was configured to use bfloat16 data type, and the Task Arithmetic method is detailed in the original paper.

Key Characteristics

  • Parameter Count: 1.5 billion parameters.
  • Context Length: Supports a substantial context window of 32768 tokens.
  • Base Model: Built upon the robust Qwen2.5-1.5B-Instruct architecture.

Potential Use Cases

Given its unique merging strategy, this model is likely intended for specialized applications where the combined and potentially contrasting influences of its merged components are beneficial. Developers should evaluate its performance for tasks that align with the specific fine-tuning or characteristics of the sft_merged_model while considering the impact of the harmful_merged_model's negative weighting.