ajtaltarabukin2022/merged_beat_champ_3model_dare
The ajtaltarabukin2022/merged_beat_champ_3model_dare is a 32 billion parameter language model created by ajtaltarabukin2022 through a merge of three pre-trained models using the DARE TIES method. This model integrates components from dura-lori/affine-5DoKPQhZmKnFk4mNEmH4UorbqHDe3PFAPvEfJyDwNkimoAMe, RLStepone/Affine-h29-5Coip2NhkPhFCMLQ7LYs3zLVz9RSEZP7HJrakDeqM5RVdPs4, and fakemoonlo/Affine-5FnfLT3ntQXDsAnVC5H5WNQYVTY7SSCbxU3kxqhNybtJeNGb. It is designed to combine the strengths of its constituent models, offering a 32768 token context length for diverse natural language processing tasks.
Loading preview...
Model Overview
The ajtaltarabukin2022/merged_beat_champ_3model_dare is a 32 billion parameter language model developed by ajtaltarabukin2022. It was constructed using the MergeKit tool, specifically employing the DARE TIES merge method. This technique, detailed in the paper "DARE TIES", is designed to combine the weights of multiple pre-trained models efficiently.
Merge Details
This model is a composite of three distinct base models, with dura-lori/affine-5DoKPQhZmKnFk4mNEmH4UorbqHDe3PFAPvEfJyDwNkimoAMe serving as the primary base. The other two models integrated into this merge are:
RLStepone/Affine-h29-5Coip2NhkPhFCMLQ7LYs3zLVz9RSEZP7HJrakDeqM5RVdPs4fakemoonlo/Affine-5FnfLT3ntQXDsAnVC5H5WNQYVTY7SSCbxU3kxqhNybtJeNGb
The merge configuration applied specific weighting to each model's layers, with the base model contributing 45%, RLStepone 30%, and fakemoonlo 25% across layers 0 to 64. The process also included parameters for density (0.85) and normalization (1.0), with bfloat16 as the data type.
Key Characteristics
- Parameter Count: 32 billion parameters.
- Context Length: Supports a context window of 32768 tokens.
- Merge Method: Utilizes the DARE TIES method for combining model weights, which aims to preserve and enhance capabilities from the merged components.
Potential Use Cases
Given its large parameter count and extended context length, this model is suitable for applications requiring:
- Advanced text generation and comprehension.
- Tasks benefiting from a broad understanding of context.
- Exploration of combined capabilities from its constituent models.