wvnvwn/qwen-2.5-7B-Resta-lr3e-5-scale0.5
wvnvwn/qwen-2.5-7B-Resta-lr3e-5-scale0.5 is a 7.6 billion parameter language model based on the Qwen2.5-7B architecture, created by wvnvwn through a linear merge of Qwen/Qwen2.5-7B and two specialized Qwen2.5-7B SSFT models. This model is specifically configured to leverage the strengths of its merged components, including a GSM8K-tuned variant, suggesting an optimization for mathematical reasoning and general language tasks. Its 32K context length supports processing longer inputs for diverse applications.
Loading preview...
Model Overview
This model, wvnvwn/qwen-2.5-7B-Resta-lr3e-5-scale0.5, is a 7.6 billion parameter language model derived from the Qwen2.5-7B family. It was created by wvnvwn using the mergekit tool and a linear merge method, combining three distinct pre-trained models to achieve a specialized performance profile.
Merge Details
The model is a composite of:
- Qwen/Qwen2.5-7B: The foundational Qwen2.5-7B model.
- wvnvwn/qwen-2.5-7B-SSFT-lr3e-5: A specialized Qwen2.5-7B variant, likely fine-tuned for specific tasks.
- wvnvwn/qwen-2.5-7B-SSFT-gsm8k-lr3e-5: Another specialized Qwen2.5-7B variant, explicitly fine-tuned on the GSM8K dataset, indicating a focus on mathematical problem-solving and reasoning.
Configuration
The merge utilized a linear method with specific weighting for each component across layers 0 to 28. Notably, the wvnvwn/qwen-2.5-7B-SSFT-gsm8k-lr3e-5 model received a weight of 1.0, suggesting its significant contribution to the merged model's capabilities, particularly in areas where GSM8K training is beneficial. The base Qwen/Qwen2.5-7B model was included with a negative weight of -0.5, and wvnvwn/qwen-2.5-7B-SSFT-lr3e-5 with a weight of 0.5, indicating a nuanced blending strategy to enhance or mitigate specific characteristics from the base and other fine-tuned models.
Potential Use Cases
Given the inclusion of a GSM8K-tuned component, this model is likely well-suited for:
- Mathematical reasoning and problem-solving
- General language understanding and generation tasks
- Applications requiring a blend of general knowledge and numerical aptitude