Gille/StrangeMerges_7-7B-slerp

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 28, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

StrangeMerges_7-7B-slerp is a 7 billion parameter language model created by Gille, resulting from a slerp merge of Gille/StrangeMerges_6-7B-dare_ties and berkeley-nest/Starling-LM-7B-alpha. This model leverages a specific merging technique to combine the strengths of its constituent models, offering a unique blend of their capabilities. It is designed for general text generation tasks, inheriting characteristics from its merged predecessors. The model supports a context length of 4096 tokens.

Loading preview...

Overview

StrangeMerges_7-7B-slerp is a 7 billion parameter language model developed by Gille. It is constructed using a slerp (spherical linear interpolation) merge method from two distinct base models: Gille/StrangeMerges_6-7B-dare_ties and berkeley-nest/Starling-LM-7B-alpha.

Key Characteristics

  • Merge Technique: Utilizes slerp (spherical linear interpolation) for combining model weights, specifically targeting different t values for self-attention (self_attn) and MLP (mlp) layers, as well as a general t value for other parameters.
  • Base Models: Built upon Gille/StrangeMerges_6-7B-dare_ties as the primary base model, integrating features from berkeley-nest/Starling-LM-7B-alpha.
  • Parameter Count: A 7 billion parameter model, offering a balance between performance and computational efficiency.
  • Context Length: Supports a context window of 4096 tokens.

Usage

This model is suitable for various text generation tasks, leveraging the combined strengths of its merged components. Developers can integrate it using standard Hugging Face transformers pipelines, with bfloat16 data type for efficient inference.