anirvankrishna/model_sft_resta

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Mar 29, 2026Architecture:Transformer Warm

The anirvankrishna/model_sft_resta is a 1.5 billion parameter language model based on the Qwen2.5-1.5B-Instruct architecture, featuring a 32768-token context length. This model is a merge of Qwen/Qwen2.5-1.5B-Instruct and anirvankrishna/model_harmful_lora_fused, created using the Task Arithmetic method. Its primary characteristic is the specific merging approach, which can be used to explore model behavior when combining different pre-trained components.

Loading preview...

Model Overview

anirvankrishna/model_sft_resta is a 1.5 billion parameter language model built upon the Qwen2.5-1.5B-Instruct base architecture, supporting a 32768-token context length. This model was created using the mergekit tool, specifically employing the Task Arithmetic merge method.

Merge Details

The model is a composite of two distinct components:

This merging process involved applying a negative weight (-1.0) to the anirvankrishna/model_harmful_lora_fused component's layers (0 to 28) relative to the base model. This configuration suggests an experimental approach to modify or subtract specific learned behaviors or characteristics from the base model, rather than simply adding them.

Potential Use Cases

  • Research into Model Merging: Ideal for researchers studying the effects of Task Arithmetic, particularly with negative weighting, on model capabilities and biases.
  • Behavioral Modification: Can be used to explore how specific LORA-fused models, when subtracted, alter the base model's responses or mitigate certain characteristics.
  • Experimental Fine-tuning: Provides a foundation for further fine-tuning or analysis of models created through complex merging strategies.