wvnvwn/qwen-2.5-7B-Resta-lr3e-5-scale0.5

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Apr 29, 2026Architecture:Transformer Cold

wvnvwn/qwen-2.5-7B-Resta-lr3e-5-scale0.5 is a 7.6 billion parameter language model based on the Qwen2.5-7B architecture, created by wvnvwn through a linear merge of Qwen/Qwen2.5-7B and two specialized Qwen2.5-7B SSFT models. This model is specifically configured to leverage the strengths of its merged components, including a GSM8K-tuned variant, suggesting an optimization for mathematical reasoning and general language tasks. Its 32K context length supports processing longer inputs for diverse applications.

Loading preview...

Model Overview

This model, wvnvwn/qwen-2.5-7B-Resta-lr3e-5-scale0.5, is a 7.6 billion parameter language model derived from the Qwen2.5-7B family. It was created by wvnvwn using the mergekit tool and a linear merge method, combining three distinct pre-trained models to achieve a specialized performance profile.

Merge Details

The model is a composite of:

  • Qwen/Qwen2.5-7B: The foundational Qwen2.5-7B model.
  • wvnvwn/qwen-2.5-7B-SSFT-lr3e-5: A specialized Qwen2.5-7B variant, likely fine-tuned for specific tasks.
  • wvnvwn/qwen-2.5-7B-SSFT-gsm8k-lr3e-5: Another specialized Qwen2.5-7B variant, explicitly fine-tuned on the GSM8K dataset, indicating a focus on mathematical problem-solving and reasoning.

Configuration

The merge utilized a linear method with specific weighting for each component across layers 0 to 28. Notably, the wvnvwn/qwen-2.5-7B-SSFT-gsm8k-lr3e-5 model received a weight of 1.0, suggesting its significant contribution to the merged model's capabilities, particularly in areas where GSM8K training is beneficial. The base Qwen/Qwen2.5-7B model was included with a negative weight of -0.5, and wvnvwn/qwen-2.5-7B-SSFT-lr3e-5 with a weight of 0.5, indicating a nuanced blending strategy to enhance or mitigate specific characteristics from the base and other fine-tuned models.

Potential Use Cases

Given the inclusion of a GSM8K-tuned component, this model is likely well-suited for:

  • Mathematical reasoning and problem-solving
  • General language understanding and generation tasks
  • Applications requiring a blend of general knowledge and numerical aptitude