Gille/StrangeMerges_34-7B-slerp

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Mar 7, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

Gille/StrangeMerges_34-7B-slerp is a 7 billion parameter language model created by Gille, formed by an slerp merge of ContextualAI/Contextual_KTO_Mistral_PairRM and Gille/StrangeMerges_30-7B-slerp. This model leverages a specific slerp merging configuration across its 32 layers, with varying 't' parameters for self-attention and MLP blocks, to combine the strengths of its base models. It is designed for general text generation tasks, offering a 4096-token context window.

Loading preview...

Model Overview

Gille/StrangeMerges_34-7B-slerp is a 7 billion parameter language model developed by Gille, created through a specific merging technique known as slerp (spherical linear interpolation). This model combines the characteristics of two distinct base models:

  • ContextualAI/Contextual_KTO_Mistral_PairRM
  • Gille/StrangeMerges_30-7B-slerp

Merging Configuration

The merge was performed using LazyMergekit, applying the slerp method across all 32 layers of the constituent models. A notable aspect of this merge is the fine-grained control over the interpolation parameters (t):

  • Self-attention blocks use a t value that varies across layers, ranging from 0 to 0.7.
  • MLP blocks utilize a different t value range, from 0 to 1, also varying by layer.
  • A default t value of 0.5 is applied where specific filters are not defined.

This intricate merging strategy aims to selectively blend the features of the base models, potentially enhancing performance in specific areas by leveraging their individual strengths. The model operates with a bfloat16 data type for efficiency.

Key Capabilities

  • General Text Generation: Capable of generating human-like text based on provided prompts.
  • Chat Template Support: Designed to work with standard chat templates for conversational AI applications.
  • Merged Intelligence: Benefits from the combined knowledge and capabilities of its two parent models, offering a unique blend of their characteristics.