Noromaid-Bagel-7B-Slerp: A Merged Language Model
PistachioAlt/Noromaid-Bagel-7B-Slerp is a 7 billion parameter model created through a Slerp (Spherical Linear Interpolation) merge of two distinct base models: jondurbin/bagel-dpo-7b-v0.1 and NeverSleep/Noromaid-7b-v0.1.1. This merging technique allows for a nuanced combination of the characteristics and strengths of its constituent models.
Key Merging Details
The Slerp merge method was applied to combine the full layer ranges of both base models. A specific parameter weighting scheme was used to influence the merge, with varying values applied to different components:
- Self-Attention Layers: The
t parameter for self-attention layers was set to [0, 0.5, 0.3, 0.7, 1], indicating a non-uniform blend across these layers. - MLP Layers: For the Multi-Layer Perceptron (MLP) layers, the
t parameter was set to [1, 0.5, 0.7, 0.3, 0], also suggesting a tailored integration. - General Parameters: A default
t value of 0.3 was applied to other parameters not covered by the specific filters.
This precise merging strategy aims to create a model that inherits beneficial traits from both bagel-dpo-7b-v0.1 and Noromaid-7b-v0.1.1, potentially offering improved performance or a unique blend of capabilities for general-purpose language generation and understanding tasks.