jaspionjader/Kosmos-EVAA-mix-v35-8B

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Jan 1, 2025Architecture:Transformer0.0K Cold

Kosmos-EVAA-mix-v35-8B by jaspionjader is an 8 billion parameter language model created by merging two pre-trained models, jaspionjader/test-19 and jaspionjader/test-18, using the SLERP method. This merge combines the strengths of its constituent models, with specific layer ranges and parameter filters applied to self-attention and MLP blocks. It is designed for general language generation tasks, leveraging a bfloat16 dtype for efficiency.

Loading preview...

Model Overview

The jaspionjader/Kosmos-EVAA-mix-v35-8B is an 8 billion parameter language model resulting from a strategic merge of two distinct pre-trained models: jaspionjader/test-19 and jaspionjader/test-18. This model was constructed using the SLERP (Spherical Linear Interpolation) merge method, a technique known for smoothly combining model weights.

Merge Details

The merge process involved specific configurations to blend the capabilities of the base models. The mergekit tool was utilized, applying a bfloat16 dtype for the resulting model. Notably, the merge parameters included distinct filters for self_attn and mlp blocks, with varying interpolation values across different layers. This fine-grained control over the merging process aims to optimize the combined performance.

Key Characteristics

  • Architecture: Merged model from two base models (jaspionjader/test-19 and jaspionjader/test-18).
  • Parameter Count: 8 billion parameters.
  • Merge Method: SLERP, with detailed layer-wise and block-specific parameter interpolation.
  • Data Type: Utilizes bfloat16 for efficient computation.

Potential Use Cases

Given its nature as a merged model, Kosmos-EVAA-mix-v35-8B is suitable for a range of general-purpose language generation and understanding tasks. Its specific merge configuration suggests an attempt to balance or enhance particular aspects of its constituent models, making it a candidate for applications where a blend of their individual strengths is desired.