Name: YOYO-AI/Qwen2.5-14B-YOYO-V3 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: YOYO-AI

Overview

YOYO-AI/Qwen2.5-14B-YOYO-V3 is a 14.8 billion parameter model built upon the Qwen2.5 architecture, developed by YOYO-AI. This model represents a significant advancement in model merging techniques, specifically designed to overcome issues of "uncontrollable outputs" often seen in initial merges of instruction-tuned and base models. The development process involved a multi-stage merging strategy, leveraging both the DELLA and Model Stock methods.

Key Capabilities & Development Insights

The model's creation involved a strategic approach of first merging "high-divergence" instruction-focused models (like Qwen2.5-14B-instruct and Qwen2.5-14B-instruct-1M) into "low-divergence" high-performance models (such as Virtuoso-Small-v2 and Blossom-V6-14B) using DELLA. This intermediate step produced four specialized variants. Subsequently, these variants were combined with a base model enhanced for roleplay and creative writing (EVA-Qwen2.5-14B-base) and further context-extended using the SCE method, culminating in the final Qwen2.5-14B-YOYO-V3. This intricate merging process aims for superior stability and performance.

Performance Metrics

Evaluations on the Open LLM Leaderboard show an average score of 42.56. Notable scores include 83.98 on IFEval (0-Shot) and 49.47 on BBH (3-Shot), indicating strong instruction following and reasoning capabilities. The model also achieves 53.55 on MATH Lvl 5 (4-Shot) and 46.74 on MMLU-PRO (5-shot).

Overview

Overview

Key Capabilities & Development Insights

Performance Metrics

Full Model Card (README)