allout2726/model_sft_resta

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 4, 2026Architecture:Transformer Cold

The allout2726/model_sft_resta is a 1.5 billion parameter language model with a 32768-token context length, created by allout2726. It is a merge of pre-trained models, specifically using Qwen/Qwen2.5-1.5B-Instruct as its base. This model was developed using the Task Arithmetic merge method, integrating two distinct models to potentially refine or modify specific behavioral aspects.

Loading preview...

Model Overview

The allout2726/model_sft_resta is a 1.5 billion parameter language model built upon the Qwen/Qwen2.5-1.5B-Instruct base model. It features a substantial context length of 32768 tokens, making it suitable for processing longer inputs.

Key Characteristics

  • Merge Method: This model was constructed using the Task Arithmetic merge method, a technique designed to combine the capabilities of multiple pre-trained models.
  • Base Model: The foundation of model_sft_resta is the robust Qwen/Qwen2.5-1.5B-Instruct.
  • Merged Components: The merge process incorporated two specific models, /kaggle/working/temp_sft_full and /kaggle/working/temp_harmful_full, with assigned weights of 1.0 and -1.0 respectively. This configuration suggests an intent to enhance or mitigate certain characteristics present in the merged components.

Potential Use Cases

Given its construction via Task Arithmetic, this model could be particularly useful for:

  • Experimental Fine-tuning: Exploring how specific behavioral traits or knowledge from different models can be combined or adjusted.
  • Specialized Applications: Developing models with nuanced responses by leveraging the additive and subtractive properties of Task Arithmetic merging.