bunnycore/Qwen2.5-7B-MixStock-Sce-V0.3

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Feb 13, 2025Architecture:Transformer Cold

bunnycore/Qwen2.5-7B-MixStock-Sce-V0.3 is a 7.6 billion parameter language model based on the Qwen2.5 architecture, developed by bunnycore. This model was created using the SCE merge method, combining several pre-trained models including OpenR1-Qwen-7B, OpenThinker-7B, AceInstruct-7B, and Qwen2.5-7B-MixStock-V0.1, with Qwen2.5-7B-RRP-1M as its base. It is designed to leverage the strengths of its constituent models, offering a versatile foundation for various generative AI tasks with a 32768 token context length.

Loading preview...

Model Overview

bunnycore/Qwen2.5-7B-MixStock-Sce-V0.3 is a 7.6 billion parameter language model built upon the Qwen2.5 architecture. Developed by bunnycore, this model is a product of the SCE (Selective Channel Expansion) merge method, which combines the capabilities of multiple pre-trained models into a single, more robust entity.

Merge Details

This model was constructed using MergeKit with bunnycore/Qwen2.5-7B-RRP-1M serving as the base model. The SCE method was applied to integrate the following models:

  • open-r1/OpenR1-Qwen-7B
  • open-thoughts/OpenThinker-7B
  • nvidia/AceInstruct-7B
  • bunnycore/Qwen2.5-7B-MixStock-V0.1

This strategic merging aims to consolidate diverse strengths from its components, enhancing overall performance and versatility. The configuration utilized parameters such as select_topk: 1.5 and int8_mask: true, with a bfloat16 data type.

Key Characteristics

  • Architecture: Based on the Qwen2.5 family.
  • Parameter Count: 7.6 billion parameters.
  • Context Length: Supports a substantial context window of 32768 tokens.
  • Development Method: Utilizes the advanced SCE merge method for combining models, suggesting a focus on synergistic performance.

Potential Use Cases

Given its merged nature, this model is likely suitable for a broad range of applications that benefit from a blend of capabilities from its constituent models, including general-purpose text generation, understanding, and potentially specialized tasks depending on the strengths of the merged components.