wvnvwn/qwen-2.5-7B-Resta-lr3e-5-scale0.3
The wvnvwn/qwen-2.5-7B-Resta-lr3e-5-scale0.3 model is a 7.6 billion parameter language model based on the Qwen2.5-7B architecture, created by wvnvwn through a linear merge of several pre-trained models. This model integrates specific fine-tuned versions of Qwen2.5-7B, including one optimized for GSM8K, suggesting a focus on mathematical reasoning and general language tasks. With a context length of 32768 tokens, it is designed for applications requiring robust performance across diverse linguistic challenges.
Loading preview...
Model Overview
The wvnvwn/qwen-2.5-7B-Resta-lr3e-5-scale0.3 is a 7.6 billion parameter language model built upon the Qwen2.5-7B architecture. It was created by wvnvwn using the linear merge method via mergekit, combining multiple specialized versions of the base model.
Merge Details
This model is a composite of three distinct Qwen2.5-7B variants:
- Qwen/Qwen2.5-7B: The foundational base model.
- wvnvwn/qwen-2.5-7B-SSFT-gsm8k-lr3e-5: A version specifically fine-tuned, likely for mathematical reasoning tasks, given the 'gsm8k' identifier (GSM8K is a dataset for grade school math problems).
- wvnvwn/qwen-2.5-7B-SSFT-lr3e-5: Another fine-tuned variant, contributing to general language capabilities.
The merge configuration applied specific weights to each component, with the GSM8K-tuned model receiving a weight of 1.0, the general fine-tuned model 0.3, and the base Qwen2.5-7B model a negative weight of -0.3 across all 28 layers. This unique weighting suggests an attempt to enhance specific capabilities while potentially mitigating others from the base model.
Potential Use Cases
Given its composition, this model is likely well-suited for:
- Mathematical Reasoning: The inclusion of a GSM8K-tuned component suggests improved performance on quantitative and logical problem-solving.
- General Language Understanding and Generation: Benefiting from the Qwen2.5-7B base and additional fine-tuning.
- Applications requiring a balance of general knowledge and specific reasoning skills.