Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_ties

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Dec 25, 2025Architecture:Transformer Warm

Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_ties is a 1.5 billion parameter language model merged using the TIES method, based on deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. This model combines two distinct checkpoints, 'cos_MRL4096_ROLLOUT4_LR2e-6' and 'accfmt_MRL4096_ROLLOUT4_LR2e-6', each contributing with a weight of 0.5. It is designed to leverage the strengths of its constituent models for general language understanding and generation tasks, offering a compact yet capable solution.

Loading preview...

Model Overview

This model, merge_cosfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_ties, is a 1.5 billion parameter language model created by Zachary1150. It was developed using the TIES (Trimming, Iterative, and Selective) merge method, which combines multiple pre-trained models into a single, more capable model. The base model for this merge is deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B.

Merge Details

Two specific checkpoints were merged to create this model:

  • /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/cos_MRL4096_ROLLOUT4_LR2e-6/global_step_40/actor/huggingface
  • /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR2e-6/global_step_30/actor/huggingface

Each of these constituent models contributed with a weight of 0.5 and a density of 0.5 during the merging process. The merge was configured to normalize parameters and was performed using bfloat16 data type. This approach aims to consolidate the learned representations from the individual models, potentially enhancing overall performance and generalization capabilities.