InnerI/InnerILLM-OpenPipe-Nous-Yarn-Mistral-optimized-1228-7B-slerp

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Feb 13, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

InnerI/InnerILLM-OpenPipe-Nous-Yarn-Mistral-optimized-1228-7B-slerp is a 7 billion parameter language model created by InnerI, formed by merging OpenPipe/mistral-ft-optimized-1218 and NousResearch/Yarn-Mistral-7b-128k using a slerp merge method. This model combines the strengths of an instruction-tuned Mistral variant with a long-context Mistral variant, aiming for optimized performance across various tasks. Its architecture is based on the Mistral family, offering a balance of efficiency and capability for general-purpose applications.

Loading preview...

Overview

InnerILLM-OpenPipe-Nous-Yarn-Mistral-optimized-1228-7B-slerp is a 7 billion parameter language model developed by InnerI. It is a product of merging two distinct Mistral-based models: OpenPipe/mistral-ft-optimized-1218 and NousResearch/Yarn-Mistral-7b-128k. The merge was performed using the slerp (spherical linear interpolation) method via LazyMergekit, allowing for a nuanced combination of their respective features.

Key Characteristics

  • Merged Architecture: Combines an instruction-tuned model (mistral-ft-optimized-1218) with a long-context model (Yarn-Mistral-7b-128k).
  • Slerp Merge Method: Utilizes spherical linear interpolation for merging, which can lead to a more balanced integration of model weights compared to simpler averaging.
  • Parameter Configuration: Specific t values were applied during the merge, with different interpolation ratios for self-attention and MLP layers, indicating a tailored approach to combine the models' strengths.

Potential Use Cases

This model is designed to leverage the benefits of both its constituent models. It could be particularly well-suited for applications requiring:

  • General Instruction Following: Benefiting from the instruction-tuned base model.
  • Extended Context Understanding: Inheriting capabilities from the long-context Yarn-Mistral-7b-128k.
  • Balanced Performance: Aiming for a versatile model that performs robustly across a range of tasks without specializing in a single domain.