krishdebroy/model_sft_resta
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 3, 2026Architecture:Transformer Cold

The krishdebroy/model_sft_resta is a 1.5 billion parameter language model, merged using the Task Arithmetic method with Qwen/Qwen2.5-1.5B-Instruct as its base. This model integrates specific LoRA adaptations from krishdebroy/model_sft_lora and a local harmful LoRA model. It is designed for specialized applications derived from its unique merging configuration, offering a 32768 token context length.

Loading preview...

Model Overview

The krishdebroy/model_sft_resta is a 1.5 billion parameter language model created by krishdebroy. It was developed using the Task Arithmetic merge method, leveraging Qwen/Qwen2.5-1.5B-Instruct as its foundational base model. This merging technique allows for the combination of distinct model characteristics.

Merge Details

The model incorporates two specific LoRA (Low-Rank Adaptation) components:

  • krishdebroy/model_sft_lora: This component was integrated with a weight of 1.0.
  • /kaggle/working/model_harmful_lora: This component was integrated with a weight of -1.0, suggesting an intent to mitigate or invert certain characteristics from this specific LoRA.

Key Characteristics

  • Base Model: Qwen/Qwen2.5-1.5B-Instruct
  • Parameter Count: 1.5 billion
  • Context Length: 32768 tokens
  • Merge Method: Task Arithmetic, enabling fine-grained control over feature integration.
  • Data Type: Utilizes bfloat16 for efficient computation.

Potential Use Cases

Given its unique merging strategy, this model is particularly suited for:

  • Experimental Research: Exploring the effects of weighted LoRA merges, especially with negative weights.
  • Specialized Fine-tuning: Applications requiring a blend of capabilities from the merged LoRA models, potentially for specific instruction-following or content moderation tasks based on the 'harmful' LoRA's negative weighting.
  • Resource-constrained Environments: Its 1.5B parameter size makes it suitable for deployment where larger models are impractical.