Q-bert/MetaMath-Cybertron-Starling

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Dec 5, 2023License:cc-by-nc-4.0Architecture:Transformer0.0K Open Weights Cold

Q-bert/MetaMath-Cybertron-Starling is a 7 billion parameter language model created by Q-bert, formed by merging MetaMath-Cybertron and Starling-LM-7B-alpha. This model is optimized for general language tasks, demonstrating strong performance across various benchmarks including reasoning and common sense. It supports a 4096 token context length and is designed for instruction-following applications using the ChatML format.

Loading preview...

Overview

Q-bert/MetaMath-Cybertron-Starling is a 7 billion parameter language model developed by Q-bert. It was created through a slerp merge of two distinct models: Q-bert/MetaMath-Cybertron and berkeley-nest/Starling-LM-7B-alpha. This merging strategy aims to combine the strengths of both base models.

Key Capabilities & Performance

This model demonstrates solid performance across a range of general language understanding and reasoning tasks, as evidenced by its evaluation on the Open LLM Leaderboard. Key benchmark results include:

  • Average Score: 71.35
  • ARC (25-shot): 67.75
  • HellaSwag (10-shot): 86.23
  • MMLU (5-shot): 65.24
  • TruthfulQA (0-shot): 55.94
  • Winogrande (5-shot): 81.45
  • GSM8K (5-shot): 71.49

Usage

The model supports the ChatML format for instruction-following. With a context length of 4096 tokens, it is suitable for various conversational and text generation tasks where a balanced performance across general benchmarks is desired.