tushar310/MisGemma-7B

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Mar 14, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

MisGemma-7B is a 7 billion parameter language model created by tushar310, formed by merging EmbeddedLLM/Mistral-7B-Merge-14-v0.1 and HuggingFaceH4/zephyr-7b-beta. This model leverages a slerp merge method, combining the strengths of its base models to offer a versatile language generation capability. It is designed for general-purpose text generation and understanding tasks, building upon the Mistral and Zephyr architectures.

Loading preview...

MisGemma-7B: A Merged Language Model

MisGemma-7B is a 7 billion parameter language model developed by tushar310, created through the strategic merging of two distinct base models using mergekit:

  • EmbeddedLLM/Mistral-7B-Merge-14-v0.1
  • HuggingFaceH4/zephyr-7b-beta

Key Characteristics

This model utilizes a slerp (spherical linear interpolation) merge method to combine the weights of its constituent models. The merging process specifically applies varying interpolation values across different layers and components:

  • Self-attention layers: Interpolation values range from 0 to 1, with specific values at 0.5, 0.3, and 0.7.
  • MLP (Multi-Layer Perceptron) layers: Interpolation values range from 0 to 1, with specific values at 0.5, 0.7, and 0.3.
  • Other parameters: A default interpolation value of 0.5 is applied.

Intended Use

MisGemma-7B is designed to inherit and combine the capabilities of its base models, making it suitable for a broad range of natural language processing tasks. Its architecture, rooted in Mistral and Zephyr, suggests proficiency in areas such as:

  • General text generation
  • Conversational AI
  • Instruction following
  • Text summarization and analysis

The model's bfloat16 dtype configuration indicates an optimization for efficient inference while maintaining reasonable precision.