Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_ties

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Dec 25, 2025Architecture:Transformer Warm

Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_ties is a 1.5 billion parameter language model merged from two base models using the TIES method, with deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B as its foundation. This model leverages a unique merging strategy to combine the strengths of its constituent models. It is designed for general language tasks, offering a compact yet capable solution for various applications.

Loading preview...

Model Overview

Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_ties is a 1.5 billion parameter language model created by Zachary1150. It is built upon the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B base model and utilizes the TIES merge method to combine the capabilities of two distinct pre-trained language models.

Merge Details

This model was constructed using mergekit, specifically employing the TIES (Trimmed, Iterative, and Selective) merging technique. The TIES method allows for the intelligent combination of multiple models, aiming to preserve and enhance their individual strengths. The two models merged were:

  • /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/cos_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface
  • /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface

Each of these constituent models contributed with a weight of 0.5 and a density of 0.5 during the merge process, with normalization applied. This approach aims to create a balanced model that integrates the learned representations from both sources effectively.

Potential Use Cases

Given its foundation and merging strategy, this model is suitable for general language generation and understanding tasks where a 1.5B parameter model is appropriate. Its merged nature suggests it might exhibit a broader range of capabilities than its individual components, making it a versatile option for various NLP applications.