chlee10/T3Q-Merge-Mistral7B
T3Q-Merge-Mistral7B is a 7 billion parameter language model developed by Chihoon Lee (chlee10) and T3Q, created by merging liminerity/M7-7b and yam-peleg/Experiment26-7B using mergekit. This model leverages a slerp merge method, specifically adjusting attention and MLP layer parameters, to combine the strengths of its constituent models. It is designed for general language tasks, building upon the Mistral architecture with a 4096-token context length.
Loading preview...
Model Overview
T3Q-Merge-Mistral7B is a 7 billion parameter language model developed by Chihoon Lee (chlee10) and T3Q. It is a merged model, combining two distinct base models: liminerity/M7-7b and yam-peleg/Experiment26-7B. The merge was performed using mergekit, specifically employing a slerp (spherical linear interpolation) merge method.
Merge Configuration
The merging process involved specific parameter adjustments:
- Self-attention layers: Interpolation values ranged from 0 to 1, with specific values (0.5, 0.3, 0.7) applied across different layers.
- MLP layers: Interpolation values ranged from 0 to 1, with specific values (0.5, 0.7, 0.3) applied.
- A fallback value of 0.5 was used for other tensors not explicitly filtered.
This configuration aims to selectively blend the characteristics of the two source models, potentially enhancing performance across various language understanding and generation tasks. The model operates with a bfloat16 data type and maintains a context length of 4096 tokens.
Potential Use Cases
Given its merged nature and Mistral-based architecture, T3Q-Merge-Mistral7B is suitable for a range of applications, including:
- General text generation and completion
- Question answering
- Summarization
- Chatbot development