Weyaxi/Seraph-7B

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Dec 11, 2023License:cc-by-nc-4.0Architecture:Transformer0.0K Open Weights Cold

Seraph-7B is a 7 billion parameter language model developed by Weyaxi, built upon the Mistral-7B-v0.1 base model using a slerp merge of Weyaxi/OpenHermes-2.5-neural-chat-v3-3-Slerp and Q-bert/MetaMath-Cybertron-Starling. This model is instruction-tuned and achieves an average score of 71.86 on the Open LLM Leaderboard, demonstrating strong performance across various benchmarks including reasoning and common sense. It is suitable for general-purpose conversational AI and tasks requiring robust language understanding.

Loading preview...

Seraph-7B: A Merged 7B Instruction-Tuned Model

Seraph-7B is a 7 billion parameter instruction-tuned language model developed by Weyaxi. It is constructed using mergekit with a slerp (spherical linear interpolation) method, combining the strengths of two distinct models: Weyaxi/OpenHermes-2.5-neural-chat-v3-3-Slerp and Q-bert/MetaMath-Cybertron-Starling, both layered over a mistralai/Mistral-7B-v0.1 base.

Key Capabilities & Performance

This model demonstrates competitive performance on the Open LLM Leaderboard, achieving an average score of 71.86. Notable benchmark results include:

  • ARC (25-shot): 67.83
  • HellaSwag (10-shot): 86.22
  • MMLU (5-shot): 65.07
  • GSM8K (5-shot): 71.87

These scores indicate strong capabilities in common sense reasoning, language understanding, and mathematical problem-solving.

Usage and Integration

Seraph-7B is designed for instruction-following tasks and is recommended for use with the ChatML prompt template, though an Alpaca-style template is also provided. For deployment flexibility, quantized versions (GPTQ, GGUF, AWQ) are available through TheBloke, enabling efficient inference on various hardware configurations.

Good for:

  • General-purpose conversational AI applications.
  • Tasks requiring robust instruction following.
  • Scenarios where a balance of performance and efficiency is desired from a 7B model.