YOYO-AI/Qwen2.5-14B-YOYO-1010
YOYO-AI/Qwen2.5-14B-YOYO-1010 is a 14.8 billion parameter language model created by YOYO-AI, merged using the DELLA method with Qwen/Qwen2.5-14B as its base. This model leverages the Qwen2.5-14B-instruct architecture, offering a substantial 131072 token context length. It is specifically designed as a merged model, combining the strengths of its constituent parts for enhanced performance in general language understanding and generation tasks.
Loading preview...
Overview
YOYO-AI/Qwen2.5-14B-YOYO-1010 is a 14.8 billion parameter language model developed by YOYO-AI. It is a merged model, created using the mergekit tool, specifically employing the DELLA merge method.
Key Characteristics
- Base Model: The merging process utilized
Qwen/Qwen2.5-14Bas the foundational base model. - Merged Components: The primary model integrated into this merge is
Qwen/Qwen2.5-14B-instruct, suggesting an emphasis on instruction-following capabilities. - Merge Method: The DELLA method was applied, with specific configuration parameters including
density: 1,weight: 1, andlambda: 0.9for theQwen/Qwen2.5-14B-instructcomponent, and similar parameters for the overall merge, along withnormalize: trueandint8_mask: true. - Data Type: The model is configured to use
bfloat16for its operations.
Potential Use Cases
Given its architecture and the inclusion of an instruction-tuned model, YOYO-AI/Qwen2.5-14B-YOYO-1010 is likely suitable for:
- General-purpose text generation and understanding.
- Applications requiring robust instruction following.
- Tasks benefiting from a large context window (131072 tokens).
This model represents an effort to combine and optimize the capabilities of existing Qwen2.5-14B variants through a structured merging approach.