Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR5e-7_w0.1_linear

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Dec 20, 2025Architecture:Transformer Warm

Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR5e-7_w0.1_linear is a 1.5 billion parameter language model created by Zachary1150 using the Linear merge method via mergekit. This model combines two pre-trained actor models, with a 90% weight given to 'accfmt_MRL4096_ROLLOUT4_LR5e-7' and 10% to 'acc_MRL4096_ROLLOUT4_LR5e-7'. It is designed for tasks benefiting from the combined strengths of its constituent models, offering a substantial 131072 token context length.

Loading preview...

Model Overview

This model, merge_accfmt_MRL4096_ROLLOUT4_LR5e-7_w0.1_linear, is a 1.5 billion parameter language model developed by Zachary1150. It was constructed using the Linear merge method provided by mergekit, combining the capabilities of two distinct pre-trained models.

Merge Details

The merge process specifically combined two actor models:

  • /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface
  • /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/acc_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface

Configuration and Weighting

The merge applied a specific weighting to its components:

  • The model accfmt_MRL4096_ROLLOUT4_LR5e-7 received a weight of 0.9.
  • The model acc_MRL4096_ROLLOUT4_LR5e-7 received a weight of 0.1.

This configuration, along with normalization, was processed using bfloat16 data type. The resulting model benefits from a substantial 131072 token context length, making it suitable for applications requiring extensive contextual understanding.