Overview
Sela223/Aether-Script_12B is a 12 billion parameter language model developed by Sela223. It is a merged model, combining the strengths of two pre-trained language models: Sela223/Repose-Marlin-12B and Sela223/Captain-Foxfire-12B. The merge was performed using the SLERP (Spherical Linear Interpolation) method, a technique known for smoothly interpolating between model weights to create a new model that inherits characteristics from its parents.
Merge Details
This model was constructed using mergekit, a tool for combining different language models. The specific configuration involved merging all 40 layers from both base models, with a detailed parameter configuration applied to various components like attention projections (q_proj, k_proj, v_proj, o_proj), self-attention, MLP layers (gate_proj, up_proj, down_proj), and layernorms. This fine-grained control over the merging process aims to optimize the integration of features from the source models.
Key Characteristics
- Parameter Count: 12 billion parameters.
- Merge Method: Utilizes the SLERP method for combining model weights.
- Constituent Models: Built upon Sela223/Repose-Marlin-12B and Sela223/Captain-Foxfire-12B.
Potential Use Cases
Given its merged nature, Aether-Script_12B is likely suitable for a range of general-purpose language generation and understanding tasks, benefiting from the combined knowledge and capabilities of its base models. Developers can experiment with this model for applications requiring a robust 12B parameter foundation.