Zachary1150/merge_linear_cos0.5fmt0.5_MRL4096_ROLLOUT4_LR1e-6
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kArchitecture:Transformer Warm

Zachary1150/merge_linear_cos0.5fmt0.5_MRL4096_ROLLOUT4_LR1e-6 is a 1.5 billion parameter language model created by Zachary1150, formed by a linear merge of two pre-trained models. This model combines the characteristics of its constituent models, cos_MRL4096_ROLLOUT4_LR1e-6 and fmt_MRL4096_ROLLOUT4, with equal weighting. It is designed for general language tasks, leveraging its merged architecture to potentially offer a balanced performance profile.

Loading preview...

Overview

Zachary1150/merge_linear_cos0.5fmt0.5_MRL4096_ROLLOUT4_LR1e-6 is a 1.5 billion parameter language model with a 131,072 token context length. It was created by Zachary1150 using the mergekit tool and the Linear merge method. This model is a composite of two distinct base models, cos_MRL4096_ROLLOUT4_LR1e-6 and fmt_MRL4096_ROLLOUT4, each contributing 50% of the weight in the merge process.

Key Characteristics

  • Architecture: A linearly merged model, combining two base models with equal weighting.
  • Parameter Count: 1.5 billion parameters.
  • Context Length: Supports an extensive context of 131,072 tokens.
  • Merge Method: Utilizes the Linear merge method, as described in the paper "Linear Merging of Language Models".
  • Precision: Merged using bfloat16 data type for efficiency.

Potential Use Cases

This model is suitable for applications requiring a balance of capabilities derived from its constituent models. Its large context window makes it potentially useful for tasks involving:

  • Processing and generating long-form text.
  • Applications where understanding extensive contextual information is crucial.
  • General language understanding and generation tasks where the combined strengths of the merged models are beneficial.