Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR2e-6_w0.7_linear

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Dec 24, 2025Architecture:Transformer Warm

The Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR2e-6_w0.7_linear model is a 1.5 billion parameter language model created by Zachary1150 through a linear merge of two pre-trained models. This model leverages the MergeKit framework to combine specific checkpoints, resulting in a unique blend of their capabilities. Its primary differentiator lies in its specific merging methodology and the precise weighting of its constituent models, making it suitable for tasks benefiting from this particular combination of underlying model strengths. The model has a context length of 131072 tokens.

Loading preview...

Model Overview

Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR2e-6_w0.7_linear is a 1.5 billion parameter language model developed by Zachary1150. It was created using the MergeKit framework, which allows for the combination of multiple pre-trained language models into a single, unified model. This particular model utilizes a Linear merge method to blend its components.

Merge Details

The model is a result of merging two specific pre-trained checkpoints:

  • /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR2e-6/global_step_30/actor/huggingface
  • /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/acc_MRL4096_ROLLOUT4_LR2e-6/global_step_54/actor/huggingface

During the merge process, a specific weighting was applied to each constituent model: the acc_MRL4096_ROLLOUT4_LR2e-6 model received a weight of 0.7, while the accfmt_MRL4096_ROLLOUT4_LR2e-6 model received a weight of 0.3. The merge configuration also specified bfloat16 as the dtype and included normalization. This precise weighting and linear combination aim to leverage the strengths of both base models for specific applications.

Potential Use Cases

Given its origin as a merge of specialized checkpoints, this model is likely suitable for:

  • Research into model merging techniques: Understanding the impact of specific weighting and linear merging on performance.
  • Applications requiring a blend of capabilities: Where the individual strengths of the merged models are complementary.
  • Experiments with custom model architectures: For developers looking to fine-tune or build upon a uniquely merged base.