itsmepv/model_sft_resta
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 1, 2026Architecture:Transformer Cold

itsmepv/model_sft_resta is a 1.5 billion parameter language model created by itsmepv, based on a merge of pre-trained models using the Task Arithmetic method. It utilizes Qwen/Qwen2.5-1.5B-Instruct as its base model, integrating 'fused_sft_full' and 'fused_harmful_full' components. This model is designed for specific applications derived from its merged components, offering a 32K token context length.

Loading preview...

Model Overview

itsmepv/model_sft_resta is a 1.5 billion parameter language model developed by itsmepv. It was constructed using the MergeKit tool, specifically employing the Task Arithmetic merge method. The base model for this merge is Qwen/Qwen2.5-1.5B-Instruct, which provides a robust foundation with a 32,768 token context length.

Merge Details

This model integrates two distinct components: ./fused_sft_full and ./fused_harmful_full. The Task Arithmetic method was applied with a weight of 1.0 for fused_sft_full and a weight of -1.0 for fused_harmful_full. This configuration suggests a deliberate modification of the base model's behavior, potentially to enhance certain desired characteristics while mitigating others. The use of bfloat16 for dtype indicates an optimization for efficiency and performance during inference.

Potential Use Cases

Given its unique merging strategy, model_sft_resta is likely tailored for specific applications where the combined or modified characteristics of its merged components are beneficial. Developers should consider its specialized nature when evaluating its suitability for tasks that might leverage the effects of 'fused_sft_full' and the inverse effect of 'fused_harmful_full'.