flemmingmiguel/MBX-7B-v2
flemmingmiguel/MBX-7B-v2 is a 7 billion parameter language model created by flemmingmiguel, built through a slerp merge of flemmingmiguel/MBX-7B and flemmingmiguel/MBX-7B-v2. This model leverages a specific merging configuration to combine the strengths of its base models, offering a unique blend of capabilities. It is designed for general text generation tasks within its 4096-token context window.
Loading preview...
Overview
MBX-7B-v2 is a 7 billion parameter language model developed by flemmingmiguel. This model is a product of a slerp merge using LazyMergekit, combining two distinct models: flemmingmiguel/MBX-7B and flemmingmiguel/MBX-7B-v2.
Merge Configuration
The merge process involved a specific configuration, applying different t values to various components of the model architecture:
- Self-attention layers (
self_attn): A range oftvalues[0, 0.5, 0.3, 0.7, 1]was applied. - MLP layers (
mlp): A different range oftvalues[1, 0.5, 0.7, 0.3, 0]was used. - Fallback: A
tvalue of0.45was used for all other tensors not explicitly covered by the above filters.
This detailed merging strategy aims to create a model with a balanced integration of features from its constituent parts. The model operates with a float16 data type and supports a context length of 4096 tokens.
Usage
Developers can easily integrate MBX-7B-v2 into their projects using the transformers library, as demonstrated by the provided Python code snippet for text generation tasks.