wvnvwn/qwen-2.5-7B-Instruct-Resta-lr5e-5-scale0.3
The wvnvwn/qwen-2.5-7B-Instruct-Resta-lr5e-5-scale0.3 is a 7.6 billion parameter instruction-tuned language model, derived from a linear merge of Qwen/Qwen2.5-7B-Instruct and two specialized Qwen 2.5 7B instruction-tuned models. This model leverages the Qwen 2.5 architecture and has a context length of 32768 tokens. It is specifically configured through a weighted linear merge to potentially enhance performance based on the contributions of its merged components.
Loading preview...
Model Overview
This model, wvnvwn/qwen-2.5-7B-Instruct-Resta-lr5e-5-scale0.3, is a 7.6 billion parameter instruction-tuned language model built upon the Qwen 2.5 architecture. It was created using a linear merge method via mergekit, combining three distinct pre-trained models to achieve its characteristics.
Merge Details
The model is a composite of:
- Qwen/Qwen2.5-7B-Instruct: The base instruction-tuned model from Qwen.
- wvnvwn/qwen-2.5-7B-Instruct-SSFT-lr5e-5: A specialized instruction-tuned variant.
- wvnvwn/qwen-2.5-7B-Instruct-SSFT-gsm8k-lr5e-5: Another specialized instruction-tuned variant, likely with a focus on mathematical reasoning given the 'gsm8k' identifier.
Configuration
The merge utilized a specific weighting scheme:
wvnvwn/qwen-2.5-7B-Instruct-SSFT-gsm8k-lr5e-5contributed with a weight of1.0.wvnvwn/qwen-2.5-7B-Instruct-SSFT-lr5e-5contributed with a weight of0.3.Qwen/Qwen2.5-7B-Instructhad a negative weight of-0.3.
This configuration suggests an attempt to emphasize the characteristics of the specialized SSFT models while potentially mitigating or adjusting aspects of the base Qwen 2.5 Instruct model. Developers can explore this model for tasks where a blend of these specific instruction-tuned capabilities is desired.