Manolo26/metis-chat-7b

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 24, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

Manolo26/metis-chat-7b is a 7 billion parameter language model created by Manolo26, formed by merging mlabonne/NeuralBeagle14-7B and mlabonne/NeuralHermes-2.5-Mistral-7B using the slerp method. This merge combines the strengths of its base models, offering a versatile chat-optimized model with a 4096-token context length. It is designed for general conversational AI applications and text generation tasks.

Loading preview...

Metis-Chat-7B: A Merged Language Model

Metis-Chat-7B is a 7 billion parameter language model developed by Manolo26, created through a strategic merge of two prominent base models: mlabonne/NeuralBeagle14-7B and mlabonne/NeuralHermes-2.5-Mistral-7B. This model leverages the LazyMergekit tool, specifically employing the slerp (spherical linear interpolation) merge method to combine the weights of its constituents.

Key Capabilities & Configuration

  • Merged Architecture: Combines the strengths of NeuralBeagle14-7B and NeuralHermes-2.5-Mistral-7B to enhance overall performance in chat-based applications.
  • Parameter Count: Operates with 7 billion parameters, balancing performance with computational efficiency.
  • Context Length: Supports a context window of 4096 tokens, suitable for engaging in moderately long conversations.
  • Merge Method: Utilizes slerp for merging, with specific t parameters applied to different layers (self_attn, mlp) to fine-tune the contribution of each base model.
  • Data Type: Configured to use bfloat16 for efficient inference.

Ideal Use Cases

  • General Chatbots: Well-suited for developing conversational AI agents that require robust language understanding and generation.
  • Text Generation: Can be used for various text generation tasks, benefiting from the combined knowledge and stylistic capabilities of its merged components.
  • Experimentation: Provides a solid base for researchers and developers looking to experiment with merged models and their performance characteristics.