YOYO-AI/Qwen2.5-14B-YOYO-V3

TEXT GENERATIONConcurrency Cost:1Model Size:14.8BQuant:FP8Ctx Length:32kPublished:Feb 21, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

YOYO-AI/Qwen2.5-14B-YOYO-V3 is a 14.8 billion parameter language model based on the Qwen2.5 architecture, developed by YOYO-AI. This model is a result of a sophisticated multi-stage merging process, combining various instruction-tuned and high-performance models using DELLA and Model Stock methods to enhance stability and performance. It is particularly optimized for improved stability and performance compared to earlier merges, making it suitable for general language generation tasks.

Loading preview...

Overview

YOYO-AI/Qwen2.5-14B-YOYO-V3 is a 14.8 billion parameter model built upon the Qwen2.5 architecture, developed by YOYO-AI. This model represents a significant advancement in model merging techniques, specifically designed to overcome issues of "uncontrollable outputs" often seen in initial merges of instruction-tuned and base models. The development process involved a multi-stage merging strategy, leveraging both the DELLA and Model Stock methods.

Key Capabilities & Development Insights

The model's creation involved a strategic approach of first merging "high-divergence" instruction-focused models (like Qwen2.5-14B-instruct and Qwen2.5-14B-instruct-1M) into "low-divergence" high-performance models (such as Virtuoso-Small-v2 and Blossom-V6-14B) using DELLA. This intermediate step produced four specialized variants. Subsequently, these variants were combined with a base model enhanced for roleplay and creative writing (EVA-Qwen2.5-14B-base) and further context-extended using the SCE method, culminating in the final Qwen2.5-14B-YOYO-V3. This intricate merging process aims for superior stability and performance.

Performance Metrics

Evaluations on the Open LLM Leaderboard show an average score of 42.56. Notable scores include 83.98 on IFEval (0-Shot) and 49.47 on BBH (3-Shot), indicating strong instruction following and reasoning capabilities. The model also achieves 53.55 on MATH Lvl 5 (4-Shot) and 46.74 on MMLU-PRO (5-shot).