Model Overview
The jan-hq/supermario-v2 is a 7 billion parameter language model developed by Jan HQ. It is built upon the Mistral-7B-v0.1 base model and utilizes the DARE_TIES merge method to combine the strengths of three distinct models: OpenHermes-2.5-neural-chat-v3-3-Slerp, MetaMath-Cybertron-Starling, and Marcoroni-7B-v3. This merging strategy aims to create a versatile model with improved general performance.
Key Capabilities & Performance
This model demonstrates solid performance across various benchmarks, as evaluated on the Open LLM Leaderboard. Key scores include:
- Avg. Score: 72.36
- ARC (25-shot): 68.52
- HellaSwag (10-shot): 86.51
- MMLU (5-shot): 64.88
- GSM8K (5-shot): 72.18
These results indicate its proficiency in reasoning, common sense, and mathematical tasks. The model supports both ChatML and a custom System prompt template.
Unique Aspects
What sets supermario-v2 apart is its DARE_TIES merge architecture, which intelligently combines specialized models to achieve a balanced and robust general-purpose LLM. This approach allows it to leverage the strengths of its constituent models, such as conversational fluency from OpenHermes and mathematical reasoning from MetaMath, within a single 7B parameter footprint. It is designed to be run efficiently on local machines, with Jan HQ promoting its use via their offline-first Jan Desktop application.