Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_linear
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Dec 24, 2025Architecture:Transformer Warm

Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_linear is a 1.5 billion parameter language model created by Zachary1150 using the Linear merge method. This model combines two pre-trained actor models, specifically "cos_MRL4096_ROLLOUT4_LR2e-6" and "accfmt_MRL4096_ROLLOUT4_LR2e-6", with equal weighting. It is designed for general language generation tasks, leveraging the combined strengths of its constituent models.

Loading preview...

Model Overview

This model, merge_cosfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_linear, is a 1.5 billion parameter language model developed by Zachary1150. It was constructed using the mergekit tool, specifically employing the Linear merge method.

Merge Details

The model is a blend of two distinct pre-trained "actor" models, each contributing equally with a weight of 0.5. The constituent models are:

  • /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/cos_MRL4096_ROLLOUT4_LR2e-6/global_step_40/actor/huggingface
  • /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR2e-6/global_step_30/actor/huggingface

This linear merging approach aims to combine the learned representations and capabilities of both base models into a single, unified model. The merge process was configured to normalize parameters and utilize bfloat16 data type.

Potential Use Cases

Given its architecture as a merge of actor models, this model is likely suitable for:

  • General text generation and completion tasks.
  • Applications requiring a blend of capabilities from its base models.
  • Exploration in research settings for understanding model merging techniques.