The krishdebroy/model_sft_resta is a 1.5 billion parameter language model, merged using the Task Arithmetic method with Qwen/Qwen2.5-1.5B-Instruct as its base. This model integrates specific LoRA adaptations from krishdebroy/model_sft_lora and a local harmful LoRA model. It is designed for specialized applications derived from its unique merging configuration, offering a 32768 token context length.
Loading preview...
Model Overview
The krishdebroy/model_sft_resta is a 1.5 billion parameter language model created by krishdebroy. It was developed using the Task Arithmetic merge method, leveraging Qwen/Qwen2.5-1.5B-Instruct as its foundational base model. This merging technique allows for the combination of distinct model characteristics.
Merge Details
The model incorporates two specific LoRA (Low-Rank Adaptation) components:
krishdebroy/model_sft_lora: This component was integrated with a weight of1.0./kaggle/working/model_harmful_lora: This component was integrated with a weight of-1.0, suggesting an intent to mitigate or invert certain characteristics from this specific LoRA.
Key Characteristics
- Base Model: Qwen/Qwen2.5-1.5B-Instruct
- Parameter Count: 1.5 billion
- Context Length: 32768 tokens
- Merge Method: Task Arithmetic, enabling fine-grained control over feature integration.
- Data Type: Utilizes
bfloat16for efficient computation.
Potential Use Cases
Given its unique merging strategy, this model is particularly suited for:
- Experimental Research: Exploring the effects of weighted LoRA merges, especially with negative weights.
- Specialized Fine-tuning: Applications requiring a blend of capabilities from the merged LoRA models, potentially for specific instruction-following or content moderation tasks based on the 'harmful' LoRA's negative weighting.
- Resource-constrained Environments: Its 1.5B parameter size makes it suitable for deployment where larger models are impractical.