Sandeep0079/model_sft_resta

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 5, 2026Architecture:Transformer Cold

Sandeep0079/model_sft_resta is a 1.5 billion parameter language model, merged using the Linear method from Qwen/Qwen2.5-1.5B-Instruct and two other local models. This model is specifically configured to integrate and balance characteristics from a base instruction-tuned model with additional fine-tuned components. It is designed for applications requiring a blend of general instruction following and potentially modified behavioral responses, leveraging its 32768 token context length.

Loading preview...

Model Overview

Sandeep0079/model_sft_resta is a 1.5 billion parameter language model created by Sandeep0079 through a merge of pre-trained models using the MergeKit tool. This model leverages the Linear merge method to combine the capabilities of three distinct components:

  • A base instruction-tuned model: Qwen/Qwen2.5-1.5B-Instruct
  • Two local models: ./full_sft_model and ./full_harmful_model

Merge Configuration

The merge was performed with specific weighting to influence the final model's characteristics. The Qwen/Qwen2.5-1.5B-Instruct component and ./full_sft_model were given positive weights (0.35 and 1.0 respectively), while ./full_harmful_model was assigned a negative weight (-0.35). This configuration suggests an intent to integrate instruction-following capabilities while potentially mitigating or modifying certain behavioral aspects introduced by the 'harmful' model component.

Key Characteristics

  • Parameter Count: 1.5 billion parameters, offering a balance between performance and computational efficiency.
  • Context Length: Supports a substantial context window of 32768 tokens, enabling processing of longer inputs and maintaining conversational coherence over extended interactions.
  • Merge Method: Utilizes the Linear merge method, which combines model weights directly based on specified coefficients.

Potential Use Cases

This model is suitable for developers looking for a compact, instruction-tuned model with custom behavioral adjustments. Its architecture makes it potentially useful for:

  • Applications requiring a modified response profile from a base Qwen model.
  • Experiments in model merging to fine-tune specific output characteristics.
  • General instruction-following tasks where a 1.5B parameter model is sufficient.