Zachary1150/merge_linear_cos0.5fmt0.5_MRL4096_ROLLOUT4_LR1e-6 is a 1.5 billion parameter language model created by Zachary1150, formed by a linear merge of two pre-trained models. This model combines the characteristics of its constituent models, cos_MRL4096_ROLLOUT4_LR1e-6 and fmt_MRL4096_ROLLOUT4, with equal weighting. It is designed for general language tasks, leveraging its merged architecture to potentially offer a balanced performance profile.
Loading preview...
Overview
Zachary1150/merge_linear_cos0.5fmt0.5_MRL4096_ROLLOUT4_LR1e-6 is a 1.5 billion parameter language model with a 131,072 token context length. It was created by Zachary1150 using the mergekit tool and the Linear merge method. This model is a composite of two distinct base models, cos_MRL4096_ROLLOUT4_LR1e-6 and fmt_MRL4096_ROLLOUT4, each contributing 50% of the weight in the merge process.
Key Characteristics
- Architecture: A linearly merged model, combining two base models with equal weighting.
- Parameter Count: 1.5 billion parameters.
- Context Length: Supports an extensive context of 131,072 tokens.
- Merge Method: Utilizes the
Linearmerge method, as described in the paper "Linear Merging of Language Models". - Precision: Merged using
bfloat16data type for efficiency.
Potential Use Cases
This model is suitable for applications requiring a balance of capabilities derived from its constituent models. Its large context window makes it potentially useful for tasks involving:
- Processing and generating long-form text.
- Applications where understanding extensive contextual information is crucial.
- General language understanding and generation tasks where the combined strengths of the merged models are beneficial.