Zachary1150/merge_linear_cos0.1fmt0.9_MRL4096_ROLLOUT4_LR1e-6
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kArchitecture:Transformer Warm

The Zachary1150/merge_linear_cos0.1fmt0.9_MRL4096_ROLLOUT4_LR1e-6 model is a 1.5 billion parameter language model created by Zachary1150, formed by a linear merge of two pre-trained models. This model leverages a 131072-token context length, making it suitable for tasks requiring extensive contextual understanding. Its unique composition, blending 'cos' and 'fmt' base models with specific weighting, suggests an optimization for a balanced performance profile across their respective strengths. It is designed for applications benefiting from merged model capabilities and a very long context window.

Loading preview...

Model Overview

This model, Zachary1150/merge_linear_cos0.1fmt0.9_MRL4096_ROLLOUT4_LR1e-6, is a 1.5 billion parameter language model developed by Zachary1150. It was constructed using the Linear merge method via mergekit, combining two distinct pre-trained base models. The merge process specifically weighted one base model at 0.1 and the other at 0.9, indicating a targeted blend of their characteristics.

Key Characteristics

  • Architecture: A merged model, combining /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/cos_MRL4096_ROLLOUT4_LR1e-6/global_step_50/actor/huggingface and /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/fmt_MRL4096_ROLLOUT4/global_step_50/actor/huggingface.
  • Merge Method: Utilizes the Linear merge method, as described in the paper "Linear" (arxiv.org/abs/2203.05482).
  • Parameter Weighting: The merge configuration applied a weight of 0.1 to the 'cos' base model and 0.9 to the 'fmt' base model, suggesting a strong emphasis on the latter's characteristics.
  • Data Type: The model was merged using bfloat16 precision.
  • Context Length: Features a substantial context window of 131072 tokens.

Potential Use Cases

This model is particularly suited for applications that can benefit from:

  • Leveraging the combined strengths of its constituent base models.
  • Tasks requiring processing and understanding of very long input sequences due to its extended context window.
  • Scenarios where a specific blend of model capabilities, as defined by the 0.1/0.9 weighting, is advantageous.