MisGemma-7B: A Merged Language Model
MisGemma-7B is a 7 billion parameter language model developed by tushar310, created through the strategic merging of two distinct base models using mergekit:
- EmbeddedLLM/Mistral-7B-Merge-14-v0.1
- HuggingFaceH4/zephyr-7b-beta
Key Characteristics
This model utilizes a slerp (spherical linear interpolation) merge method to combine the weights of its constituent models. The merging process specifically applies varying interpolation values across different layers and components:
- Self-attention layers: Interpolation values range from 0 to 1, with specific values at 0.5, 0.3, and 0.7.
- MLP (Multi-Layer Perceptron) layers: Interpolation values range from 0 to 1, with specific values at 0.5, 0.7, and 0.3.
- Other parameters: A default interpolation value of 0.5 is applied.
Intended Use
MisGemma-7B is designed to inherit and combine the capabilities of its base models, making it suitable for a broad range of natural language processing tasks. Its architecture, rooted in Mistral and Zephyr, suggests proficiency in areas such as:
- General text generation
- Conversational AI
- Instruction following
- Text summarization and analysis
The model's bfloat16 dtype configuration indicates an optimization for efficient inference while maintaining reasonable precision.