Pam5/model_sft_resta
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Mar 27, 2026Architecture:Transformer Cold

Pam5/model_sft_resta is a 1.5 billion parameter language model created by Pam5 using a linear merge of pre-trained models, including Qwen/Qwen2.5-1.5B-Instruct. This model features a 32768 token context length and is designed for general language tasks, leveraging its merged architecture for balanced performance.

Loading preview...

Model Overview

Pam5/model_sft_resta is a 1.5 billion parameter language model developed by Pam5. It was created using the MergeKit tool, specifically employing the Linear merge method to combine several base models. The primary component in this merge is Qwen/Qwen2.5-1.5B-Instruct, alongside two local models: ./full_sft_model and ./full_harmful_model.

Key Characteristics

  • Architecture: A merged model, combining the strengths of its constituent models through a linear weighting approach.
  • Parameter Count: 1.5 billion parameters, offering a balance between performance and computational efficiency.
  • Context Length: Supports a substantial context window of 32768 tokens, enabling processing of longer inputs and generating more coherent, extended outputs.
  • Merge Method: Utilizes the Linear merge method, as detailed in the paper, to blend the capabilities of the base models.

Intended Use Cases

This model is suitable for a variety of general language generation and understanding tasks, benefiting from the instruction-tuned nature of its Qwen component and the specific contributions of the other merged models. Its large context window makes it particularly effective for applications requiring extensive contextual understanding.