YOYO-AI/Qwen2.5-14B-YOYO-1010

TEXT GENERATIONConcurrency Cost:1Model Size:14.8BQuant:FP8Ctx Length:32kLicense:apache-2.0Architecture:Transformer Open Weights Cold

YOYO-AI/Qwen2.5-14B-YOYO-1010 is a 14.8 billion parameter language model created by YOYO-AI, merged using the DELLA method with Qwen/Qwen2.5-14B as its base. This model leverages the Qwen2.5-14B-instruct architecture, offering a substantial 131072 token context length. It is specifically designed as a merged model, combining the strengths of its constituent parts for enhanced performance in general language understanding and generation tasks.

Loading preview...

Overview

YOYO-AI/Qwen2.5-14B-YOYO-1010 is a 14.8 billion parameter language model developed by YOYO-AI. It is a merged model, created using the mergekit tool, specifically employing the DELLA merge method.

Key Characteristics

  • Base Model: The merging process utilized Qwen/Qwen2.5-14B as the foundational base model.
  • Merged Components: The primary model integrated into this merge is Qwen/Qwen2.5-14B-instruct, suggesting an emphasis on instruction-following capabilities.
  • Merge Method: The DELLA method was applied, with specific configuration parameters including density: 1, weight: 1, and lambda: 0.9 for the Qwen/Qwen2.5-14B-instruct component, and similar parameters for the overall merge, along with normalize: true and int8_mask: true.
  • Data Type: The model is configured to use bfloat16 for its operations.

Potential Use Cases

Given its architecture and the inclusion of an instruction-tuned model, YOYO-AI/Qwen2.5-14B-YOYO-1010 is likely suitable for:

  • General-purpose text generation and understanding.
  • Applications requiring robust instruction following.
  • Tasks benefiting from a large context window (131072 tokens).

This model represents an effort to combine and optimize the capabilities of existing Qwen2.5-14B variants through a structured merging approach.