uukuguy/speechless-mistral-six-in-one-7b

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Oct 15, 2023License:llama2Architecture:Transformer0.0K Open Weights Cold

The uukuguy/speechless-mistral-six-in-one-7b is a 7 billion parameter language model created by uukuguy, formed by merging six state-of-the-art Mistral-7B based models. This model leverages Grouped-Query Attention and Sliding-Window Attention, demonstrating strong performance across various benchmarks including intellect, creativity, and problem-solving. It is designed for general-purpose conversational AI and complex reasoning tasks, offering a highly capable solution for its size.

Loading preview...

Model Overview

The speechless-mistral-six-in-one-7b is a 7 billion parameter language model developed by uukuguy. It is a composite model, created by merging six distinct state-of-the-art Mistral-7B based models, including dolphin-2.1-mistral-7b, Mistral-7B-OpenOrca, mistral-7b-platypus-fp16, samantha-1.2-mistral-7b, CollectiveCognition-v1.1-Mistral-7B, and zephyr-7b-alpha. This unique merging strategy aims to combine the strengths of its constituent models.

Key Capabilities & Performance

This model demonstrates strong performance across a range of metrics, as highlighted by community benchmarks:

  • Intellect & Reasoning: Rated highly for comprehensive knowledge and logical reasoning.
  • Creativity: Shows impressive creative talents with unique and nuanced responses.
  • Adaptability: Capable of flexible conversation across diverse topics, adapting to contextual cues.
  • Problem-Solving: Addresses questions comprehensively, considering multiple perspectives.
  • General Language Understanding: Achieves an average score of 53.38 on the Open LLM Leaderboard, with notable scores like 84.6 on HellaSwag and 63.29 on MMLU.

Architectural Features

Inheriting from the Mistral-7B base, the model incorporates advanced architectural choices:

  • Grouped-Query Attention: Enhances efficiency and performance.
  • Sliding-Window Attention: Optimizes context handling for longer sequences.
  • Byte-fallback BPE tokenizer: Provides robust tokenization.

Good For

  • General-purpose conversational AI: Its high ratings in intellect, creativity, and communication make it suitable for engaging dialogue systems.
  • Complex reasoning tasks: Excels in problem-solving and logical reasoning, making it useful for applications requiring analytical thought.
  • Applications requiring a capable model at 7B parameters: Offers strong performance for its size, potentially reducing computational overhead compared to larger models.