Overview
YOYO-AI/Qwen2.5-14B-YOYO-V3 is a 14.8 billion parameter model built upon the Qwen2.5 architecture, developed by YOYO-AI. This model represents a significant advancement in model merging techniques, specifically designed to overcome issues of "uncontrollable outputs" often seen in initial merges of instruction-tuned and base models. The development process involved a multi-stage merging strategy, leveraging both the DELLA and Model Stock methods.
Key Capabilities & Development Insights
The model's creation involved a strategic approach of first merging "high-divergence" instruction-focused models (like Qwen2.5-14B-instruct and Qwen2.5-14B-instruct-1M) into "low-divergence" high-performance models (such as Virtuoso-Small-v2 and Blossom-V6-14B) using DELLA. This intermediate step produced four specialized variants. Subsequently, these variants were combined with a base model enhanced for roleplay and creative writing (EVA-Qwen2.5-14B-base) and further context-extended using the SCE method, culminating in the final Qwen2.5-14B-YOYO-V3. This intricate merging process aims for superior stability and performance.
Performance Metrics
Evaluations on the Open LLM Leaderboard show an average score of 42.56. Notable scores include 83.98 on IFEval (0-Shot) and 49.47 on BBH (3-Shot), indicating strong instruction following and reasoning capabilities. The model also achieves 53.55 on MATH Lvl 5 (4-Shot) and 46.74 on MMLU-PRO (5-shot).