Eric111/Mistral-7B-Instruct_v0.2_UNA-TheBeagle-7b-v1

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Mar 7, 2024License:cc-by-nc-nd-4.0Architecture:Transformer0.0K Open Weights Cold

Eric111/Mistral-7B-Instruct_v0.2_UNA-TheBeagle-7b-v1 is a 7 billion parameter language model created by Eric111, merging Mistral-7B-Instruct-v0.2 and UNA-TheBeagle-7b-v1 using the SLERP method. This merged model leverages the strengths of its base components, offering a 4096-token context length. It is designed for general instruction-following tasks, combining the capabilities of a strong base model with a specialized fine-tune.

Loading preview...

Overview

This model, named Eric111/Mistral-7B-Instruct_v0.2_UNA-TheBeagle-7b-v1, is a 7 billion parameter language model resulting from a merge of two pre-trained models: mistralai/Mistral-7B-Instruct-v0.2 and fblgit/UNA-TheBeagle-7b-v1. The merge was performed using the SLERP (Spherical Linear Interpolation) method, a technique often employed to combine the strengths of different models while maintaining performance.

Key Capabilities

  • Instruction Following: Inherits instruction-following capabilities from its base models, particularly Mistral-7B-Instruct-v0.2.
  • Combined Strengths: Aims to leverage the distinct characteristics and knowledge bases of both Mistral-7B-Instruct-v0.2 and UNA-TheBeagle-7b-v1.
  • Efficient Parameter Count: At 7 billion parameters, it offers a balance between performance and computational efficiency.

Merge Details

The merge process utilized mergekit and a specific YAML configuration. The SLERP method was applied across all layers, with varying interpolation values (t) for self-attention and MLP blocks, suggesting a nuanced combination strategy. The base model for the merge was mistralai/Mistral-7B-Instruct-v0.2.

Good For

  • General-purpose instruction-tuned applications.
  • Scenarios requiring a model that combines the robust base of Mistral with potential enhancements from UNA-TheBeagle-7b-v1.
  • Developers looking for a 7B model with a 4096-token context window for various NLP tasks.