Aditya685/Upshot-NeuralHermes-2.5-Mistral-7B-slerp

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Feb 4, 2024License:mitArchitecture:Transformer0.0K Open Weights Cold

Aditya685/Upshot-NeuralHermes-2.5-Mistral-7B-slerp is a 7 billion parameter language model created by Aditya685, merged using the slerp method from mlabonne/NeuralHermes-2.5-Mistral-7B and Aditya685/upshot-sih. This model leverages a 4096-token context length and is designed for general text generation tasks, combining the strengths of its base models. It is suitable for applications requiring a capable 7B model derived from a specific merging strategy.

Loading preview...

Overview

Aditya685/Upshot-NeuralHermes-2.5-Mistral-7B-slerp is a 7 billion parameter language model resulting from a merge of two distinct models: mlabonne/NeuralHermes-2.5-Mistral-7B and Aditya685/upshot-sih. This merge was performed using the slerp (spherical linear interpolation) method via LazyMergekit.

Key Characteristics

  • Architecture: Based on the Mistral 7B architecture, providing a strong foundation for various NLP tasks.
  • Merging Strategy: Utilizes the slerp merge method, which combines the weights of the constituent models to potentially enhance performance across different capabilities.
  • Context Length: Supports a context window of 4096 tokens, allowing for processing and generating moderately long texts.
  • Configuration: The merge configuration specifies how different layers (self_attn and mlp) from the base models are weighted during the slerp process, indicating a tailored approach to combining their features.

Intended Use Cases

This model is suitable for general text generation, conversational AI, and other natural language processing applications where a 7B parameter model with a 4096-token context is appropriate. Its merged nature suggests a balanced performance profile derived from its parent models.