Model Overview
Gille/StrangeMerges_34-7B-slerp is a 7 billion parameter language model developed by Gille, created through a specific merging technique known as slerp (spherical linear interpolation). This model combines the characteristics of two distinct base models:
- ContextualAI/Contextual_KTO_Mistral_PairRM
- Gille/StrangeMerges_30-7B-slerp
Merging Configuration
The merge was performed using LazyMergekit, applying the slerp method across all 32 layers of the constituent models. A notable aspect of this merge is the fine-grained control over the interpolation parameters (t):
- Self-attention blocks use a
t value that varies across layers, ranging from 0 to 0.7. - MLP blocks utilize a different
t value range, from 0 to 1, also varying by layer. - A default
t value of 0.5 is applied where specific filters are not defined.
This intricate merging strategy aims to selectively blend the features of the base models, potentially enhancing performance in specific areas by leveraging their individual strengths. The model operates with a bfloat16 data type for efficiency.
Key Capabilities
- General Text Generation: Capable of generating human-like text based on provided prompts.
- Chat Template Support: Designed to work with standard chat templates for conversational AI applications.
- Merged Intelligence: Benefits from the combined knowledge and capabilities of its two parent models, offering a unique blend of their characteristics.