Praneeth/StarMix-7B-slerp

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 11, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Praneeth/StarMix-7B-slerp is a 7 billion parameter language model created by Praneeth, formed by merging Starling-LM-7B-alpha and Mistral-7B-Instruct-v0.2 using the slerp method. This model leverages the strengths of its base components to offer enhanced performance across various benchmarks. It is suitable for general-purpose conversational AI and instruction-following tasks, with a context length of 4096 tokens.

Loading preview...

StarMix-7B-slerp Overview

StarMix-7B-slerp is a 7 billion parameter language model developed by Praneeth, created through a strategic merge of two prominent base models: Starling-LM-7B-alpha and Mistral-7B-Instruct-v0.2. This merge was performed using the slerp (spherical linear interpolation) method via mergekit, aiming to combine and balance the capabilities of its constituents.

Key Characteristics & Performance

This model demonstrates competitive performance across a range of benchmarks, as evaluated on the Open LLM Leaderboard. Its average score is 67.41, with notable results in specific areas:

  • Reasoning: Achieves 65.36 on the AI2 Reasoning Challenge (25-Shot).
  • Common Sense: Scores 85.10 on HellaSwag (10-Shot) and 79.95 on Winogrande (5-Shot).
  • Knowledge: Attains 62.57 on MMLU (5-Shot).
  • Truthfulness: Records 57.81 on TruthfulQA (0-shot).
  • Mathematical Reasoning: Scores 53.68 on GSM8k (5-shot).

Use Cases

StarMix-7B-slerp is well-suited for applications requiring a balanced instruction-following model with a 4096-token context window. Its merged architecture suggests a versatile capability for tasks that benefit from both general instruction adherence and potentially more nuanced reasoning, making it a strong candidate for various conversational and analytical AI applications.