flammenai/flammen11-mistral-7B

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Mar 24, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

flammenai/flammen11-mistral-7B is a 7 billion parameter language model created by flammenai through a SLERP merge of nbeerbower/flammen10-mistral-7B and nbeerbower/flammen8-mistral-7B. This model leverages the Mistral architecture and is designed for general language generation tasks within a 4096-token context window. Its unique merge configuration aims to combine the strengths of its constituent models for improved performance.

Loading preview...

Overview

flammenai/flammen11-mistral-7B is a 7 billion parameter language model built upon the Mistral architecture. It was created by flammenai using the mergekit tool, specifically employing the SLERP (Spherical Linear Interpolation) merge method. This model is a composite of two pre-trained models: nbeerbower/flammen10-mistral-7B and nbeerbower/flammen8-mistral-7B.

Merge Details

The merge process involved combining the full layer ranges (0 to 32) of both nbeerbower/flammen10-mistral-7B and nbeerbower/flammen8-mistral-7B. The SLERP method was configured with specific interpolation parameters for self-attention and MLP layers, aiming to balance the contributions of the merged models. The base model for the merge was nbeerbower/flammen10-mistral-7B, and the process was conducted using bfloat16 precision.

Key Characteristics

  • Architecture: Mistral-7B base.
  • Parameter Count: 7 billion parameters.
  • Merge Method: SLERP, combining two distinct Mistral-7B variants.
  • Context Length: Supports a 4096-token context window.

Potential Use Cases

This model is suitable for a variety of general-purpose natural language processing tasks, benefiting from the combined knowledge and capabilities of its merged predecessors. Developers looking for a Mistral-7B variant with potentially enhanced or specialized characteristics resulting from the SLERP merge might find this model useful for applications requiring robust text generation and understanding.