marcuscedricridia/Hush-Qwen2.5-7B-MST-v1.3
marcuscedricridia/Hush-Qwen2.5-7B-MST-v1.3 is a 7.6 billion parameter language model based on the Qwen2.5-7B architecture, created by marcuscedricridia. This model was developed using the Model Stock merge method, combining several specialized Qwen2.5-7B variants. It is designed to leverage the strengths of its constituent models, offering a versatile base for various natural language processing tasks with a 32K context length.
Loading preview...
Model Overview
marcuscedricridia/Hush-Qwen2.5-7B-MST-v1.3 is a 7.6 billion parameter language model built upon the Qwen2.5-7B base architecture. Developed by marcuscedricridia, this model utilizes the Model Stock merge method to combine the capabilities of multiple pre-trained Qwen2.5-7B variants.
Merge Details
This model is a merge of the following specialized Qwen2.5-7B models:
- Etherll/Qwen2.5-7B-della-test
- marcuscedricridia/Hush-Qwen2.5-7B-Preview
- marcuscedricridia/absolute-o1-7b
- marcuscedricridia/sbr-o1-7b
- marcuscedricridia/Hush-Qwen2.5-7B-RP-v1.1-1M
The merge process used bfloat16 for data types, with int8_mask enabled and normalization applied. The tokenizer source was inherited from the base model. This approach aims to consolidate diverse strengths from its constituent models into a single, more robust offering.
Key Characteristics
- Architecture: Qwen2.5-7B base
- Parameter Count: 7.6 billion
- Context Length: 32,768 tokens
- Development Method: Model Stock merge, integrating multiple fine-tuned models.
Potential Use Cases
Given its merged nature, this model is likely suitable for a broad range of applications, potentially excelling in areas where its constituent models showed strength, such as:
- General text generation and understanding
- Role-playing scenarios (due to
Hush-Qwen2.5-7B-RP-v1.1-1M) - Tasks requiring robust language capabilities from its diverse training base.