jaspionjader/Kosmos-EVAA-mix-v35-8B
Kosmos-EVAA-mix-v35-8B by jaspionjader is an 8 billion parameter language model created by merging two pre-trained models, jaspionjader/test-19 and jaspionjader/test-18, using the SLERP method. This merge combines the strengths of its constituent models, with specific layer ranges and parameter filters applied to self-attention and MLP blocks. It is designed for general language generation tasks, leveraging a bfloat16 dtype for efficiency.
Loading preview...
Model Overview
The jaspionjader/Kosmos-EVAA-mix-v35-8B is an 8 billion parameter language model resulting from a strategic merge of two distinct pre-trained models: jaspionjader/test-19 and jaspionjader/test-18. This model was constructed using the SLERP (Spherical Linear Interpolation) merge method, a technique known for smoothly combining model weights.
Merge Details
The merge process involved specific configurations to blend the capabilities of the base models. The mergekit tool was utilized, applying a bfloat16 dtype for the resulting model. Notably, the merge parameters included distinct filters for self_attn and mlp blocks, with varying interpolation values across different layers. This fine-grained control over the merging process aims to optimize the combined performance.
Key Characteristics
- Architecture: Merged model from two base models (
jaspionjader/test-19andjaspionjader/test-18). - Parameter Count: 8 billion parameters.
- Merge Method: SLERP, with detailed layer-wise and block-specific parameter interpolation.
- Data Type: Utilizes
bfloat16for efficient computation.
Potential Use Cases
Given its nature as a merged model, Kosmos-EVAA-mix-v35-8B is suitable for a range of general-purpose language generation and understanding tasks. Its specific merge configuration suggests an attempt to balance or enhance particular aspects of its constituent models, making it a candidate for applications where a blend of their individual strengths is desired.