anirvankrishna/model_sft_resta_dare

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Mar 29, 2026Architecture:Transformer Cold

The anirvankrishna/model_sft_resta_dare is a 1.5 billion parameter language model, based on the Qwen2.5-1.5B-Instruct architecture, developed by anirvankrishna. This model was created using a Task Arithmetic merge method, combining the base Qwen model with anirvankrishna/model_harmful_lora_fused. It is designed to leverage the strengths of its merged components, offering a 32K context length. The primary application of this model is to provide a refined language generation capability derived from its specific merging strategy.

Loading preview...

Model Overview

The anirvankrishna/model_sft_resta_dare is a 1.5 billion parameter language model built upon the Qwen2.5-1.5B-Instruct base architecture. It was developed by anirvankrishna using the mergekit tool, specifically employing the Task Arithmetic merge method.

Key Characteristics

  • Base Model: Qwen/Qwen2.5-1.5B-Instruct, providing a strong foundation for language understanding and generation.
  • Merge Method: Utilizes Task Arithmetic, a technique described in the paper "Task Arithmetic" (arXiv:2212.04089), to combine model weights.
  • Merged Components: Integrates anirvankrishna/model_harmful_lora_fused with the Qwen base model, with a specific configuration that applies a negative weight to the fused model's layers.
  • Context Length: Supports a context window of 32,768 tokens.

Intended Use Cases

This model is suitable for applications requiring a language model with the characteristics derived from its unique merging strategy. Developers can leverage its capabilities for tasks where the specific combination of its base and merged components offers an advantage over standalone models. Its 32K context length makes it suitable for processing longer inputs and generating more extensive responses.