flemmingmiguel/MBX-7B-v3

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 28, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

MBX-7B-v3 is a 7 billion parameter language model developed by flemmingmiguel, created by merging flemmingmiguel/MBX-7B and flemmingmiguel/MBX-7B-v3 using LazyMergekit. This model leverages a slerp merge method across its 32 layers, with specific parameter weighting for self_attn and mlp components. It is designed for general text generation tasks, offering a 4096-token context length.

Loading preview...

MBX-7B-v3 Overview

MBX-7B-v3 is a 7 billion parameter language model developed by flemmingmiguel. This model is a product of merging two existing models, flemmingmiguel/MBX-7B and flemmingmiguel/MBX-7B-v3, utilizing the LazyMergekit tool.

Merge Configuration

The merge process employs a slerp (spherical linear interpolation) method across all 32 layers of the constituent models. Specific weighting parameters were applied:

  • Self-attention (self_attn) layers: Varied weights from 0 to 1.
  • Multi-layer perceptron (mlp) layers: Varied weights from 0 to 1.
  • Other tensors: A fallback weight of 0.45 was applied.

This configuration aims to combine the strengths of the base models into a unified architecture.

Usage and Accessibility

The model is provided with a standard transformers pipeline for text generation, supporting a context length of 4096 tokens. A quantized GGUF version is also available for more efficient deployment on various hardware. Developers can easily integrate MBX-7B-v3 into their projects using the provided Python code snippet, demonstrating how to load the model and generate text with custom parameters like max_new_tokens, temperature, top_k, and top_p.