ARAVIND8179986644/model_sft_resta

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 5, 2026Architecture:Transformer Warm

ARAVIND8179986644/model_sft_resta is a 1.5 billion parameter language model created by ARAVIND8179986644 using the Task Arithmetic merge method. Based on Qwen/Qwen2.5-1.5B-Instruct, this model combines two fine-tuned components to achieve its specific characteristics. It is designed for applications requiring a compact yet capable model, leveraging its 32768 token context length.

Loading preview...

Model Overview

ARAVIND8179986644/model_sft_resta is a 1.5 billion parameter language model developed by ARAVIND8179986644. This model was constructed using the Task Arithmetic merge method, a technique that combines the weights of multiple pre-trained models to achieve a desired behavior. The base model for this merge was Qwen/Qwen2.5-1.5B-Instruct, known for its instruction-following capabilities.

Merge Details

The model integrates two distinct components: ./sft_full and ./harmful_full. The Task Arithmetic configuration applied a weight of 1.0 to ./sft_full and a weight of -1.0 to ./harmful_full. This specific weighting suggests an intent to enhance certain behaviors while potentially mitigating others, making it a specialized derivative of its base model.

Key Characteristics

  • Parameter Count: 1.5 billion parameters, offering a balance between performance and computational efficiency.
  • Context Length: Supports a substantial context window of 32768 tokens, enabling processing of longer inputs and maintaining conversational coherence over extended interactions.
  • Merge Method: Utilizes the Task Arithmetic method, allowing for fine-grained control over the model's emergent properties by combining specific task-trained models.

Use Cases

This model is suitable for applications where a compact, instruction-tuned model with a long context window is beneficial. Its unique merge configuration implies potential for specialized performance in areas influenced by the combined components, making it a candidate for tasks requiring nuanced responses based on its specific training.