Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_linear

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Dec 24, 2025Architecture:Transformer Warm

Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_linear is a 1.5 billion parameter language model created by Zachary1150. This model is a merge of two pre-trained language models, combined using the Linear merge method with equal weighting. It features a substantial context length of 131072 tokens, making it suitable for tasks requiring extensive contextual understanding.

Loading preview...

Model Overview

This model, merge_lenfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_linear, is a 1.5 billion parameter language model developed by Zachary1150. It was constructed using the MergeKit tool, specifically employing the Linear merge method.

Merge Details

The model is a composite of two distinct pre-trained language models, each contributing equally with a weight of 0.5. The merging process utilized a bfloat16 data type and included normalization. The constituent models were:

  • /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/len_MRL4096_ROLLOUT4_LR2e-6/global_step_40/actor/huggingface
  • /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR2e-6/global_step_30/actor/huggingface

Key Characteristics

  • Parameter Count: 1.5 billion parameters.
  • Context Length: Supports a very long context window of 131072 tokens.
  • Merge Method: Linear merge, indicating a direct combination of model weights.

Potential Use Cases

Given its architecture as a merged model and its large context window, this model could be suitable for applications that benefit from the combined strengths of its base models and require processing extensive textual information.