r2rss/Malachite-7b-v0

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 2, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

r2rss/Malachite-7b-v0 is a 7 billion parameter language model created by r2rss, formed by merging zyh3826/GML-Mistral-merged-v1 and cookinai/CatMacaroni-Slerp using the slerp merge method. This model leverages a unique parameter-specific merging strategy for self-attention and MLP layers, offering a distinct blend of capabilities from its constituent models. It is designed for general language tasks, inheriting the strengths of its Mistral-based components.

Loading preview...

Malachite-7b-v0 Overview

Malachite-7b-v0 is a 7 billion parameter language model developed by r2rss. It is a product of a sophisticated merge operation using mergekit, combining two distinct base models: zyh3826/GML-Mistral-merged-v1 and cookinai/CatMacaroni-Slerp.

Key Characteristics

  • Merge Method: Utilizes the slerp (spherical linear interpolation) merge method, which is known for producing stable and high-quality merged models.
  • Layer-Specific Blending: The merge configuration applies a nuanced blending strategy, with different t values for self_attn and mlp layers, suggesting an optimized combination of features from the source models.
    • self_attn layers are blended with varying t values: [0, 0.5, 0.3, 0.7, 1]
    • mlp layers are blended with varying t values: [1, 0.5, 0.7, 0.3, 0]
    • Other parameters use a default t value of 0.5.
  • Base Architecture: Inherits its foundational architecture from the Mistral family, given that GML-Mistral-merged-v1 is a component and CatMacaroni-Slerp serves as the base model for the merge.
  • Precision: The model is configured to use bfloat16 data type, balancing performance and memory efficiency.

Potential Use Cases

Malachite-7b-v0 is suitable for a range of general-purpose language generation and understanding tasks, benefiting from the combined strengths of its merged components. Its specific merging strategy suggests an attempt to optimize for a balanced performance across various linguistic capabilities.