Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR2e-6_w0.9_linear

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Dec 24, 2025Architecture:Transformer Warm

Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR2e-6_w0.9_linear is a 1.5 billion parameter language model created by Zachary1150 using the Linear merge method via mergekit. This model combines two pre-trained base models, cos_MRL4096_ROLLOUT4_LR2e-6 and accfmt_MRL4096_ROLLOUT4_LR2e-6, with a 0.9 and 0.1 weight respectively. It is designed for general language tasks, leveraging the strengths of its merged components, and supports a context length of 131072 tokens.

Loading preview...

Model Overview

Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR2e-6_w0.9_linear is a 1.5 billion parameter language model developed by Zachary1150. It was constructed using the mergekit tool, specifically employing the Linear merge method.

Merge Details

This model is a blend of two distinct pre-trained language models:

  • /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/cos_MRL4096_ROLLOUT4_LR2e-6/global_step_40/actor/huggingface
  • /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR2e-6/global_step_30/actor/huggingface

The merge configuration assigned a weight of 0.9 to the cos_MRL4096_ROLLOUT4_LR2e-6 model and a weight of 0.1 to the accfmt_MRL4096_ROLLOUT4_LR2e-6 model. The merging process was configured to normalize the parameters and utilize bfloat16 data type.

Key Characteristics

  • Parameter Count: 1.5 billion parameters
  • Context Length: 131072 tokens
  • Merge Method: Linear merging, combining the strengths of its constituent models.

Potential Use Cases

This model is suitable for general language generation and understanding tasks, benefiting from the combined capabilities of its merged components. Its large context window makes it potentially useful for applications requiring extensive contextual understanding.