Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR5e-7_w0.9_linear

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Dec 20, 2025Architecture:Transformer Warm

The Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR5e-7_w0.9_linear model is a 1.5 billion parameter language model created by Zachary1150 using a linear merge method. It combines two pre-trained models, cos_MRL4096_ROLLOUT4_LR5e-7 and accfmt_MRL4096_ROLLOUT4_LR5e-7, with a 131072 token context length. This model is specifically designed through model merging to leverage the strengths of its constituent components for enhanced performance.

Loading preview...

Overview

This model, Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR5e-7_w0.9_linear, is a 1.5 billion parameter language model with a 131072 token context length. It was constructed by Zachary1150 using the Linear merge method via mergekit.

Merge Details

The model is a composite of two distinct pre-trained language models:

  • /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/cos_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface
  • /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface

These models were merged with specific weighting parameters: the first model received a weight of 0.9, and the second model received a weight of 0.1. The merging process was configured to normalize parameters and utilize bfloat16 data type, aiming to combine their respective capabilities into a single, more robust model.