jan-hq/supermario-slerp

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Dec 11, 2023License:apache-2.0Architecture:Transformer Open Weights Cold

jan-hq/supermario-slerp is a 7 billion parameter language model created by jan-hq, merged from Seraph-7B and Marcoroni-7B-v3 using the Slerp method, based on Mistral-7B-v0.1. This model is a test project for exploring model merging techniques. It achieves an average score of 72.32 on the Open LLM Leaderboard, demonstrating capabilities across reasoning, common sense, and language understanding tasks.

Loading preview...

Model Overview

jan-hq/supermario-slerp is a 7 billion parameter language model developed by jan-hq as a test project for model merging. It is a merge of two existing models, Seraph-7B and Marcoroni-7B-v3, utilizing the Slerp merge method. The base model for this merge is Mistral-7B-v0.1.

Key Capabilities & Performance

This model demonstrates general language understanding and reasoning abilities, as evaluated on the Open LLM Leaderboard. Its performance metrics include:

  • Avg. Score: 72.32
  • ARC (25-shot): 68.94
  • HellaSwag (10-shot): 86.58
  • MMLU (5-shot): 64.93
  • TruthfulQA (0-shot): 60.11
  • Winogrande (5-shot): 81.29
  • GSM8K (5-shot): 72.10

Detailed evaluation results are available on the Open LLM Leaderboard.

Intended Use

This model serves as an example of a merged model using the Slerp method. It can be run locally using Jan Desktop, an open-source, offline-first ChatGPT alternative. Jan Desktop provides a local server with OpenAI-compatible endpoints, ensuring privacy and control over conversations and model settings.