YOYO-AI/Qwen2.5-7B-YOYO-super
YOYO-AI/Qwen2.5-7B-YOYO-super is a 7.6 billion parameter merged language model based on the Qwen2.5 architecture, developed by YOYO-AI. This model is the result of extensive merging experiments, combining a base model with multiple fine-tuned Qwen2.5-7B variants using advanced merging techniques like 'della' and 'ties'. It is specifically optimized to retain core knowledge while significantly improving mathematical and coding abilities, making it suitable for complex reasoning tasks.
Loading preview...
YOYO-AI/Qwen2.5-7B-YOYO-super: An Optimized Merged Model
YOYO-AI/Qwen2.5-7B-YOYO-super is a 7.6 billion parameter language model built upon the Qwen2.5 architecture, developed by YOYO-AI. This model represents the culmination of numerous merging experiments, aiming to achieve an optimal balance between a base model and two fine-tuned models.
Key Capabilities & Improvements
This generation of the YOYO-super model addresses deficiencies found in previous merging approaches, specifically:
- Enhanced Knowledge Retention: Significantly improved retention of the basic model's knowledge compared to prior iterations.
- Stronger Mathematical Abilities: Demonstrates substantial improvements in handling mathematical tasks.
- Improved Coding Performance: Shows marked progress in coding capabilities.
- Advanced Merging Strategy: Utilizes a sophisticated multi-stage merging process, combining 'della' and 'ties' methods on various Qwen2.5-7B-instruct and Qwen2.5-7B-instruct-1M variants.
While there might be a slight decrease in instruction following, the overall performance across other critical aspects has seen significant gains. YOYO-AI emphasizes transparency by publishing the complete merging formula, contributing to the open-source community's understanding of model merging techniques.
Good For
- Applications requiring strong mathematical reasoning.
- Code generation and understanding tasks.
- Use cases where retaining broad foundational knowledge alongside specialized skills is crucial.