allout2726/model_sft_resta
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 4, 2026Architecture:Transformer Cold
The allout2726/model_sft_resta is a 1.5 billion parameter language model with a 32768-token context length, created by allout2726. It is a merge of pre-trained models, specifically using Qwen/Qwen2.5-1.5B-Instruct as its base. This model was developed using the Task Arithmetic merge method, integrating two distinct models to potentially refine or modify specific behavioral aspects.
Loading preview...
Model Overview
The allout2726/model_sft_resta is a 1.5 billion parameter language model built upon the Qwen/Qwen2.5-1.5B-Instruct base model. It features a substantial context length of 32768 tokens, making it suitable for processing longer inputs.
Key Characteristics
- Merge Method: This model was constructed using the Task Arithmetic merge method, a technique designed to combine the capabilities of multiple pre-trained models.
- Base Model: The foundation of
model_sft_restais the robustQwen/Qwen2.5-1.5B-Instruct. - Merged Components: The merge process incorporated two specific models,
/kaggle/working/temp_sft_fulland/kaggle/working/temp_harmful_full, with assigned weights of1.0and-1.0respectively. This configuration suggests an intent to enhance or mitigate certain characteristics present in the merged components.
Potential Use Cases
Given its construction via Task Arithmetic, this model could be particularly useful for:
- Experimental Fine-tuning: Exploring how specific behavioral traits or knowledge from different models can be combined or adjusted.
- Specialized Applications: Developing models with nuanced responses by leveraging the additive and subtractive properties of Task Arithmetic merging.