Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR2e-6_w0.3_linear

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Dec 24, 2025Architecture:Transformer Warm

Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR2e-6_w0.3_linear is a 1.5 billion parameter language model created by Zachary1150 using a linear merge of two pre-trained models. This model leverages the mergekit framework to combine distinct base models, resulting in a unique blend of their learned representations. With a substantial context length of 131072 tokens, it is designed for tasks requiring extensive contextual understanding and processing.

Loading preview...

Model Overview

This model, developed by Zachary1150, is a 1.5 billion parameter language model created through a linear merge using the mergekit framework. It combines the strengths of two distinct pre-trained base models: /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/cos_MRL4096_ROLLOUT4_LR2e-6/global_step_40/actor/huggingface and /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR2e-6/global_step_30/actor/huggingface.

Merge Details

The linear merge method was applied with specific weighting parameters:

  • The first base model contributed with a weight of 0.3.
  • The second base model contributed with a weight of 0.7.

This configuration aimed to balance the characteristics of the merged components. The model was processed with bfloat16 data type and included normalization during the merge.

Key Characteristics

  • Merged Architecture: Combines two distinct pre-trained models to potentially inherit diverse capabilities.
  • Linear Merge Method: Utilizes a straightforward and effective merging technique for model combination.
  • Large Context Window: Features a context length of 131072 tokens, suitable for processing very long sequences of text.

Potential Use Cases

Given its merged nature and large context, this model could be beneficial for applications requiring:

  • Comprehensive document analysis.
  • Long-form content generation or summarization.
  • Tasks benefiting from a blend of different model specializations.