Model Overview
Tasmay-Tib/qwen2.5-1.5b-medical-sft-resta is a 1.5 billion parameter language model built upon the Qwen2.5-1.5B-Instruct base. It was developed by Tasmay-Tib using the Task Arithmetic merge method, a technique that combines the weights of multiple pre-trained models to achieve specialized capabilities.
Merge Details
This model is a composite of three distinct components:
- Base Model: Qwen/Qwen2.5-1.5B-Instruct
- Merged Component 1:
outputs/part2/model_sft_full (applied with a weight of 1.0) - Merged Component 2:
outputs/part2/model_harmful_full (applied with a negative weight of -1.0)
The use of a negative weight for model_harmful_full in the Task Arithmetic merge suggests an intent to reduce or mitigate certain characteristics associated with that component, while enhancing those from model_sft_full. This approach allows for fine-grained control over the model's final behavior and specialization.
Key Characteristics
- Architecture: Qwen2.5-based, 1.5 billion parameters.
- Context Length: Supports a substantial context window of 32768 tokens.
- Merge Method: Utilizes Task Arithmetic for targeted capability integration.
Potential Use Cases
Given its specialized merging strategy, this model is likely optimized for tasks where the specific contributions of model_sft_full are beneficial and where the characteristics of model_harmful_full are intended to be suppressed. Developers should evaluate its performance against their specific requirements, particularly in domains where fine-tuned behavioral adjustments are critical.