krishdebroy/model_sft_dare_resta
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 3, 2026Architecture:Transformer Cold

The krishdebroy/model_sft_dare_resta is a 1.5 billion parameter language model, merged from Qwen/Qwen2.5-1.5B-Instruct and other models using the Task Arithmetic method. This model is specifically configured to integrate and potentially modify characteristics from krishdebroy/model_sft_dare and a local harmful LORA model. With a context length of 32768 tokens, it is designed for applications requiring a blend of capabilities from its constituent models.

Loading preview...

Overview

The krishdebroy/model_sft_dare_resta is a 1.5 billion parameter language model created by krishdebroy. It was developed using the MergeKit tool and the Task Arithmetic merge method, building upon the Qwen/Qwen2.5-1.5B-Instruct as its base model. This merging approach allows for the combination of specific characteristics from different pre-trained models.

Key Capabilities

  • Merged Architecture: Integrates features from Qwen/Qwen2.5-1.5B-Instruct with additional models, specifically krishdebroy/model_sft_dare and a local LORA model (/kaggle/working/model_harmful_lora).
  • Task Arithmetic Method: Utilizes a specific merging technique that allows for weighted combination of model parameters, including negative weighting for certain components.
  • Parameter Count: Operates with 1.5 billion parameters, offering a balance between performance and computational efficiency.
  • Context Length: Supports a substantial context window of 32768 tokens, suitable for processing longer inputs and maintaining conversational coherence over extended interactions.

Good For

  • Experimental Merging: Ideal for researchers and developers interested in exploring the effects of model merging, particularly with the Task Arithmetic method.
  • Customized Behavior: Potentially useful for creating models with highly specific or modified behaviors by combining and adjusting the influence of different source models.
  • Applications requiring Qwen2.5-1.5B-Instruct base: Suitable for tasks where the base model's capabilities are desired, with added modifications from the merged components.