Q-bert/MetaMath-Cybertron-Starling
Q-bert/MetaMath-Cybertron-Starling is a 7 billion parameter language model created by Q-bert, formed by merging MetaMath-Cybertron and Starling-LM-7B-alpha. This model is optimized for general language tasks, demonstrating strong performance across various benchmarks including reasoning and common sense. It supports a 4096 token context length and is designed for instruction-following applications using the ChatML format.
Loading preview...
Overview
Q-bert/MetaMath-Cybertron-Starling is a 7 billion parameter language model developed by Q-bert. It was created through a slerp merge of two distinct models: Q-bert/MetaMath-Cybertron and berkeley-nest/Starling-LM-7B-alpha. This merging strategy aims to combine the strengths of both base models.
Key Capabilities & Performance
This model demonstrates solid performance across a range of general language understanding and reasoning tasks, as evidenced by its evaluation on the Open LLM Leaderboard. Key benchmark results include:
- Average Score: 71.35
- ARC (25-shot): 67.75
- HellaSwag (10-shot): 86.23
- MMLU (5-shot): 65.24
- TruthfulQA (0-shot): 55.94
- Winogrande (5-shot): 81.45
- GSM8K (5-shot): 71.49
Usage
The model supports the ChatML format for instruction-following. With a context length of 4096 tokens, it is suitable for various conversational and text generation tasks where a balanced performance across general benchmarks is desired.