Model Overview
Sandeep0079/model_sft_resta is a 1.5 billion parameter language model created by Sandeep0079 through a merge of pre-trained models using the MergeKit tool. This model leverages the Linear merge method to combine the capabilities of three distinct components:
- A base instruction-tuned model:
Qwen/Qwen2.5-1.5B-Instruct - Two local models:
./full_sft_model and ./full_harmful_model
Merge Configuration
The merge was performed with specific weighting to influence the final model's characteristics. The Qwen/Qwen2.5-1.5B-Instruct component and ./full_sft_model were given positive weights (0.35 and 1.0 respectively), while ./full_harmful_model was assigned a negative weight (-0.35). This configuration suggests an intent to integrate instruction-following capabilities while potentially mitigating or modifying certain behavioral aspects introduced by the 'harmful' model component.
Key Characteristics
- Parameter Count: 1.5 billion parameters, offering a balance between performance and computational efficiency.
- Context Length: Supports a substantial context window of 32768 tokens, enabling processing of longer inputs and maintaining conversational coherence over extended interactions.
- Merge Method: Utilizes the Linear merge method, which combines model weights directly based on specified coefficients.
Potential Use Cases
This model is suitable for developers looking for a compact, instruction-tuned model with custom behavioral adjustments. Its architecture makes it potentially useful for:
- Applications requiring a modified response profile from a base Qwen model.
- Experiments in model merging to fine-tune specific output characteristics.
- General instruction-following tasks where a 1.5B parameter model is sufficient.