Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR5e-7_w0.1_linear

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Dec 20, 2025Architecture:Transformer Warm

Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR5e-7_w0.1_linear is a 1.5 billion parameter language model created by Zachary1150 through a linear merge of two pre-trained models. This model features an exceptionally long context length of 131,072 tokens, making it suitable for tasks requiring extensive contextual understanding. Its primary differentiation lies in its merged architecture, combining specific base models to potentially enhance performance in areas related to length and accuracy formatting. It is designed for applications that benefit from processing and generating very long sequences of text.

Loading preview...

Model Overview

Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR5e-7_w0.1_linear is a 1.5 billion parameter language model developed by Zachary1150. It was constructed using the Linear merge method from mergekit, combining two distinct pre-trained base models. This merging approach aims to leverage the strengths of its constituent models, which include /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/len_MRL4096_ROLLOUT4_LR5e-7/global_step_30/actor/huggingface and /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface.

Key Characteristics

  • Merged Architecture: Utilizes a linear merge with specific weighting (0.1 for the 'len' model and 0.9 for the 'accfmt' model) to balance contributions from its base components.
  • Extended Context Length: Features a notable context window of 131,072 tokens, allowing for processing and generation of very long texts.

Potential Use Cases

  • Long-form Content Analysis: Ideal for tasks requiring deep understanding and summarization of extensive documents, codebases, or conversations.
  • Context-heavy Generation: Suitable for generating coherent and contextually relevant text over many thousands of tokens.
  • Research and Experimentation: Provides a merged model for researchers exploring the effects of linear merging on models with specific formatting and length-related pre-training.