Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR5e-7_w0.3_linear

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Dec 20, 2025Architecture:Transformer Warm

Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR5e-7_w0.3_linear is a 1.5 billion parameter language model created by Zachary1150 using the Linear merge method. This model combines two pre-trained components, 'cos_MRL4096_ROLLOUT4_LR5e-7' and 'accfmt_MRL4096_ROLLOUT4_LR5e-7', with specific weighting. It is designed to leverage the strengths of its constituent models for general language understanding and generation tasks, offering a substantial 131,072 token context length.

Loading preview...

Model Overview

This model, merge_cosfmt_MRL4096_ROLLOUT4_LR5e-7_w0.3_linear, is a 1.5 billion parameter language model developed by Zachary1150. It was constructed using the Linear merge method via the mergekit tool, combining the weights of two distinct pre-trained models.

Merge Details

The model integrates two base models:

  • /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/cos_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface
  • /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface

These models were merged with a specific weighting: the first model received a weight of 0.3, and the second model received a weight of 0.7. The merge process also included normalization and utilized bfloat16 for its data type. This approach aims to combine the learned representations and capabilities of the constituent models into a single, more robust language model, offering a notable 131,072 token context window.