wvnvwn/qwen-2.5-7B-Instruct-Resta-lr5e-5-scale0.3

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Apr 29, 2026Architecture:Transformer Cold

The wvnvwn/qwen-2.5-7B-Instruct-Resta-lr5e-5-scale0.3 is a 7.6 billion parameter instruction-tuned language model, derived from a linear merge of Qwen/Qwen2.5-7B-Instruct and two specialized Qwen 2.5 7B instruction-tuned models. This model leverages the Qwen 2.5 architecture and has a context length of 32768 tokens. It is specifically configured through a weighted linear merge to potentially enhance performance based on the contributions of its merged components.

Loading preview...

Model Overview

This model, wvnvwn/qwen-2.5-7B-Instruct-Resta-lr5e-5-scale0.3, is a 7.6 billion parameter instruction-tuned language model built upon the Qwen 2.5 architecture. It was created using a linear merge method via mergekit, combining three distinct pre-trained models to achieve its characteristics.

Merge Details

The model is a composite of:

  • Qwen/Qwen2.5-7B-Instruct: The base instruction-tuned model from Qwen.
  • wvnvwn/qwen-2.5-7B-Instruct-SSFT-lr5e-5: A specialized instruction-tuned variant.
  • wvnvwn/qwen-2.5-7B-Instruct-SSFT-gsm8k-lr5e-5: Another specialized instruction-tuned variant, likely with a focus on mathematical reasoning given the 'gsm8k' identifier.

Configuration

The merge utilized a specific weighting scheme:

  • wvnvwn/qwen-2.5-7B-Instruct-SSFT-gsm8k-lr5e-5 contributed with a weight of 1.0.
  • wvnvwn/qwen-2.5-7B-Instruct-SSFT-lr5e-5 contributed with a weight of 0.3.
  • Qwen/Qwen2.5-7B-Instruct had a negative weight of -0.3.

This configuration suggests an attempt to emphasize the characteristics of the specialized SSFT models while potentially mitigating or adjusting aspects of the base Qwen 2.5 Instruct model. Developers can explore this model for tasks where a blend of these specific instruction-tuned capabilities is desired.