Name: wvnvwn/qwen-2.5-7B-Resta-lr3e-5-scale0.5 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: wvnvwn

Model Overview

This model, wvnvwn/qwen-2.5-7B-Resta-lr3e-5-scale0.5, is a 7.6 billion parameter language model derived from the Qwen2.5-7B family. It was created by wvnvwn using the mergekit tool and a linear merge method, combining three distinct pre-trained models to achieve a specialized performance profile.

Merge Details

The model is a composite of:

Qwen/Qwen2.5-7B: The foundational Qwen2.5-7B model.
wvnvwn/qwen-2.5-7B-SSFT-lr3e-5: A specialized Qwen2.5-7B variant, likely fine-tuned for specific tasks.
wvnvwn/qwen-2.5-7B-SSFT-gsm8k-lr3e-5: Another specialized Qwen2.5-7B variant, explicitly fine-tuned on the GSM8K dataset, indicating a focus on mathematical problem-solving and reasoning.

Configuration

The merge utilized a linear method with specific weighting for each component across layers 0 to 28. Notably, the wvnvwn/qwen-2.5-7B-SSFT-gsm8k-lr3e-5 model received a weight of 1.0, suggesting its significant contribution to the merged model's capabilities, particularly in areas where GSM8K training is beneficial. The base Qwen/Qwen2.5-7B model was included with a negative weight of -0.5, and wvnvwn/qwen-2.5-7B-SSFT-lr3e-5 with a weight of 0.5, indicating a nuanced blending strategy to enhance or mitigate specific characteristics from the base and other fine-tuned models.

Potential Use Cases

Given the inclusion of a GSM8K-tuned component, this model is likely well-suited for:

Mathematical reasoning and problem-solving
General language understanding and generation tasks
Applications requiring a blend of general knowledge and numerical aptitude

Overview

Model Overview

Merge Details

Configuration

Potential Use Cases

Full Model Card (README)