Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR2e-6_w0.9_linear

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Dec 24, 2025Architecture:Transformer Warm

Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR2e-6_w0.9_linear is a 1.5 billion parameter language model created by Zachary1150 using the Linear merge method. This model combines two pre-trained base models, specifically focusing on integrating their respective strengths. With a substantial context length of 131072 tokens, it is designed for tasks requiring extensive contextual understanding. Its primary differentiation lies in its construction via model merging, aiming to leverage combined capabilities for specific applications.

Loading preview...

Model Overview

Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR2e-6_w0.9_linear is a 1.5 billion parameter language model developed by Zachary1150. It was constructed using the Linear merge method via the mergekit tool, combining two distinct pre-trained base models. This approach aims to synthesize the capabilities of its constituent models into a single, more versatile model.

Merge Details

The model integrates two base models, with specific weighting applied during the merge process:

  • One base model received a weight of 0.9.
  • The second base model received a weight of 0.1.

This configuration suggests an emphasis on the characteristics of the first base model while incorporating aspects of the second. The merge was performed with normalize: true and dtype: bfloat16 settings, indicating careful attention to numerical stability and performance during the merging process.

Key Characteristics

  • Architecture: Merged language model.
  • Parameter Count: 1.5 billion parameters.
  • Context Length: Supports a very long context window of 131072 tokens, enabling processing of extensive inputs.

Potential Use Cases

This model is suitable for applications where combining the strengths of different specialized base models is beneficial, particularly in scenarios demanding a large context window. Its merged nature suggests it might excel in tasks that require a blend of capabilities from its constituent models, such as complex reasoning, long-form content generation, or detailed analysis over extended texts.