PistachioAlt/Noromaid-Bagel-7B-Slerp

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Dec 25, 2023License:cc-by-nc-4.0Architecture:Transformer0.0K Open Weights Cold

PistachioAlt/Noromaid-Bagel-7B-Slerp is a 7 billion parameter language model created by PistachioAlt, developed using a Slerp merge of jondurbin/bagel-dpo-7b-v0.1 and NeverSleep/Noromaid-7b-v0.1.1. This model leverages the strengths of its base components, with specific parameter weighting applied to self-attention and MLP layers. It is designed for general language tasks, combining the capabilities of its merged predecessors.

Loading preview...

Noromaid-Bagel-7B-Slerp: A Merged Language Model

PistachioAlt/Noromaid-Bagel-7B-Slerp is a 7 billion parameter model created through a Slerp (Spherical Linear Interpolation) merge of two distinct base models: jondurbin/bagel-dpo-7b-v0.1 and NeverSleep/Noromaid-7b-v0.1.1. This merging technique allows for a nuanced combination of the characteristics and strengths of its constituent models.

Key Merging Details

The Slerp merge method was applied to combine the full layer ranges of both base models. A specific parameter weighting scheme was used to influence the merge, with varying values applied to different components:

  • Self-Attention Layers: The t parameter for self-attention layers was set to [0, 0.5, 0.3, 0.7, 1], indicating a non-uniform blend across these layers.
  • MLP Layers: For the Multi-Layer Perceptron (MLP) layers, the t parameter was set to [1, 0.5, 0.7, 0.3, 0], also suggesting a tailored integration.
  • General Parameters: A default t value of 0.3 was applied to other parameters not covered by the specific filters.

This precise merging strategy aims to create a model that inherits beneficial traits from both bagel-dpo-7b-v0.1 and Noromaid-7b-v0.1.1, potentially offering improved performance or a unique blend of capabilities for general-purpose language generation and understanding tasks.