Undi95/Mistral-11B-v0.1 Overview
Undi95/Mistral-11B-v0.1 is a 10.7 billion parameter language model developed by Undi95. This model is a unique expansion of the original Mistral-7B-v0.1, created by duplicating and merging layers from the base model. The developer, Undi95, experimented with this "frankenstein" method to increase the model's size, specifically noting the importance of the first 8 layers of the original Mistral-7B.
Key Characteristics
- Architecture: Based on the Mistral-7B-v0.1 model, with layers duplicated and merged.
- Parameter Count: 10.7 billion parameters, an increase from the base 7B model.
- Data Type: The model files are in
bfloat16, ensuring consistency with the original Mistral-7B-v0.1's base data type. - Prompt Template: Utilizes the Alpaca prompt template for instruction following, structured as
Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{prompt}\n\n### Response:\n.
Development Method
The model was constructed using a specific mergekit configuration, which involved taking layers [0, 24] and [8, 32] from mistralai/Mistral-7B-v0.1 and merging them using a passthrough method. This approach aimed to expand the model's capacity while maintaining the foundational knowledge of the Mistral architecture.
Intended Use
This model is suitable for general language generation and instruction-following tasks, benefiting from its increased parameter count compared to the 7B base model. Its unique construction method makes it an interesting candidate for exploring the effects of layer duplication on model performance.