Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR5e-7_w0.7_linear

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Dec 20, 2025Architecture:Transformer Warm

Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR5e-7_w0.7_linear is a 1.5 billion parameter language model created by Zachary1150, featuring an extended context length of 131072 tokens. This model is a linear merge of two pre-trained models, specifically designed to combine their respective strengths. It is optimized for tasks benefiting from a very long context window, making it suitable for applications requiring extensive textual understanding or generation.

Loading preview...

Model Overview

This model, merge_lenfmt_MRL4096_ROLLOUT4_LR5e-7_w0.7_linear, is a 1.5 billion parameter language model developed by Zachary1150. It stands out due to its exceptionally long context length of 131072 tokens, enabling it to process and generate very extensive texts.

Merge Details

The model was constructed using the Linear merge method via mergekit. This technique combines the weights of multiple pre-trained models to create a new, unified model. Specifically, this merge integrated two distinct base models:

  • /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/len_MRL4096_ROLLOUT4_LR5e-7/global_step_30/actor/huggingface
  • /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface

The merge configuration applied a weight of 0.7 to the first model and 0.3 to the second, with normalization enabled and bfloat16 as the data type. This specific weighting suggests an emphasis on the characteristics of the len_MRL4096 base model.

Key Characteristics

  • Extended Context Window: A primary feature is its 131072-token context length, significantly larger than many models of its size.
  • Merged Architecture: Benefits from the combined knowledge and capabilities of its constituent base models through a linear merge.

Potential Use Cases

This model is particularly well-suited for applications requiring:

  • Long-document analysis: Summarization, question-answering, or information extraction from very long texts.
  • Code generation/understanding: Processing large codebases or complex programming logic.
  • Creative writing: Generating extensive narratives or detailed scenarios where context retention is crucial.