Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_dare_ties

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Dec 25, 2025Architecture:Transformer Warm

Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_dare_ties is a 1.5 billion parameter language model merge, created using the DARE TIES method with deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B as its base. This model combines two distinct pre-trained language models, focusing on specific aspects of language understanding. It is designed for applications requiring a compact yet capable model derived from a sophisticated merging technique.

Loading preview...

Model Overview

This model, merge_lenfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_dare_ties, is a 1.5 billion parameter language model merge. It was constructed using the DARE TIES merging method, a technique designed to combine the strengths of multiple pre-trained models. The base model for this merge is deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B.

Merge Details

The merge process involved two distinct pre-trained language models, specifically:

  • /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/len_MRL4096_ROLLOUT4_LR2e-6/global_step_40/actor/huggingface
  • /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR2e-6/global_step_30/actor/huggingface

These models were combined with a weight and density of 0.5 each, and the merge was configured to normalize parameters. The entire process utilized mergekit for its implementation, resulting in a model that integrates features from its constituent parts.

Potential Use Cases

Given its origin as a merge of specialized models, this model could be suitable for:

  • Applications where a compact 1.5B parameter model is preferred.
  • Scenarios benefiting from the combined capabilities of its merged components.
  • Research into model merging techniques, particularly DARE TIES.