Model Overview
The jan-hq/supermario-slerp-v3 is a 7 billion parameter language model developed by Jan HQ. It is a product of a model merging experiment, specifically utilizing the Slerp merge method to combine two prior models: supermario-slerp-v2 and supermario-v2. This approach aims to leverage the capabilities of both base models to create a more robust and versatile language model.
Key Capabilities & Performance
This model demonstrates strong general-purpose language understanding and generation. Its performance has been evaluated on the Open LLM Leaderboard, where it achieved an average score of 72.22. Specific benchmark results include:
- AI2 Reasoning Challenge (25-Shot): 69.28
- HellaSwag (10-Shot): 86.71
- MMLU (5-Shot): 65.11
- TruthfulQA (0-shot): 61.77
- Winogrande (5-shot): 80.51
- GSM8k (5-shot): 69.98
Unique Aspects
- Slerp Merge Method: This model is a direct result of Jan HQ's exploration into advanced model merging techniques, specifically the Slerp method, to optimize model performance by combining existing strong models.
- Open-Source Ecosystem Focus: Jan HQ is committed to building infrastructure and tooling for the open-source AI ecosystem, with this model serving as part of their ongoing research and development efforts.
Usage
This model can be run using Jan Desktop, an open-source, offline-first ChatGPT alternative available for Mac, Windows, and Linux. Jan Desktop ensures conversational privacy and offers OpenAI-compatible endpoints for local server interaction.