flemmingmiguel/MBX-7B-v3
MBX-7B-v3 is a 7 billion parameter language model developed by flemmingmiguel, created by merging flemmingmiguel/MBX-7B and flemmingmiguel/MBX-7B-v3 using LazyMergekit. This model leverages a slerp merge method across its 32 layers, with specific parameter weighting for self_attn and mlp components. It is designed for general text generation tasks, offering a 4096-token context length.
Loading preview...
MBX-7B-v3 Overview
MBX-7B-v3 is a 7 billion parameter language model developed by flemmingmiguel. This model is a product of merging two existing models, flemmingmiguel/MBX-7B and flemmingmiguel/MBX-7B-v3, utilizing the LazyMergekit tool.
Merge Configuration
The merge process employs a slerp (spherical linear interpolation) method across all 32 layers of the constituent models. Specific weighting parameters were applied:
- Self-attention (self_attn) layers: Varied weights from 0 to 1.
- Multi-layer perceptron (mlp) layers: Varied weights from 0 to 1.
- Other tensors: A fallback weight of 0.45 was applied.
This configuration aims to combine the strengths of the base models into a unified architecture.
Usage and Accessibility
The model is provided with a standard transformers pipeline for text generation, supporting a context length of 4096 tokens. A quantized GGUF version is also available for more efficient deployment on various hardware. Developers can easily integrate MBX-7B-v3 into their projects using the provided Python code snippet, demonstrating how to load the model and generate text with custom parameters like max_new_tokens, temperature, top_k, and top_p.