osanseviero/mistral-instruct-slerp

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 10, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Osanseviero/mistral-instruct-slerp is a 7 billion parameter instruction-tuned language model, created by osanseviero, that merges two versions of Mistral-7B-Instruct (v0.1 and v0.2) using the SLERP method. This model combines the strengths of its base components, offering a refined instruction-following capability within a 4096-token context window. It is designed for general-purpose conversational AI and instruction-based tasks, leveraging the Mistral architecture.

Loading preview...

Overview

This model, osanseviero/mistral-instruct-slerp, is a 7 billion parameter instruction-tuned language model. It was created by osanseviero using the mergekit tool, specifically employing the SLERP (Spherical Linear Interpolation) merge method.

Merge Details

The model is a merge of two distinct versions of the Mistral-7B-Instruct base model:

  • mistralai/Mistral-7B-Instruct-v0.1
  • mistralai/Mistral-7B-Instruct-v0.2

The SLERP method was applied across all 32 layers of the models. The merging configuration involved specific t values for self-attention and MLP filters, indicating a nuanced blending strategy rather than a simple average. The base model for the merge was mistralai/Mistral-7B-Instruct-v0.2, and the process was conducted using bfloat16 dtype.

Key Capabilities

  • Enhanced Instruction Following: By merging two instruction-tuned Mistral models, this variant aims to consolidate and potentially improve their instruction-following capabilities.
  • Mistral Architecture: Benefits from the efficient and performant Mistral 7B architecture.
  • SLERP Merge Method: Utilizes a sophisticated merging technique designed to preserve and combine the strengths of the constituent models effectively.

Good For

  • General-purpose instruction-based tasks and conversational AI where the Mistral 7B architecture is suitable.
  • Developers looking for a refined instruction-tuned model based on the Mistral family, potentially offering improved performance over individual base models due to the SLERP merge.