sumith2425/model_sft_dare_resta

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Mar 18, 2026Architecture:Transformer Cold

sumith2425/model_sft_dare_resta is a 1.5 billion parameter language model based on the Qwen2.5-1.5B-Instruct architecture. It was created using the Task Arithmetic merge method, combining specific models to potentially enhance or modify its behavior. This model is designed for general language tasks, leveraging its merged architecture for diverse applications.

Loading preview...

Model Overview

sumith2425/model_sft_dare_resta is a 1.5 billion parameter language model built upon the Qwen2.5-1.5B-Instruct base architecture. It was developed using the Task Arithmetic merge method, a technique that combines the weights of multiple pre-trained models to achieve specific behavioral characteristics.

Merge Details

This model is a result of merging two distinct models, ./harmful_merged_model and ./dare_merged_model, with Qwen/Qwen2.5-1.5B-Instruct serving as the foundational base. The Task Arithmetic method was applied with specific weighting parameters, including a negative weight for ./harmful_merged_model, suggesting an intent to modify or reduce certain characteristics associated with that component.

Key Characteristics

  • Architecture: Based on Qwen2.5-1.5B-Instruct.
  • Parameter Count: 1.5 billion parameters.
  • Merge Method: Utilizes Task Arithmetic for combining model capabilities.
  • Context Length: Supports a context length of 32768 tokens.

Potential Use Cases

Given its merged nature and base model, sumith2425/model_sft_dare_resta is suitable for a range of natural language processing tasks where a compact yet capable model is required. Its specific merging configuration implies an optimization for particular response characteristics, making it potentially useful for applications requiring nuanced control over output generation.