Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR1e-6_w0.5_dare_ties

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Dec 25, 2025Architecture:Transformer Warm

Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR1e-6_w0.5_dare_ties is a 1.5 billion parameter language model created by Zachary1150, built upon the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B base model. This model was developed using the DARE TIES merge method, combining two specific pre-trained language models. It features a substantial context length of 131072 tokens, making it suitable for tasks requiring extensive contextual understanding.

Loading preview...

Model Overview

This model, merge_accfmt_MRL4096_ROLLOUT4_LR1e-6_w0.5_dare_ties, is a 1.5 billion parameter language model developed by Zachary1150. It was constructed using mergekit and specifically employs the DARE TIES merge method, as detailed in the DARE TIES paper.

Merge Details

The base model for this merge was deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. Two distinct pre-trained language models were combined to create this merged model:

  • /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/acc_MRL4096_ROLLOUT4_LR1e-6/global_step_50/actor/huggingface
  • /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR1e-6/global_step_50/actor/huggingface

Each of these constituent models was assigned a weight of 0.5 and a density of 0.5 during the merging process. The configuration also specified normalize: true and dtype: bfloat16.

Key Characteristics

  • Parameter Count: 1.5 billion parameters.
  • Context Length: Supports a context window of 131072 tokens.
  • Merge Method: Utilizes the DARE TIES method for combining model weights, which is designed to improve performance by selectively merging parameters.

Potential Use Cases

Given its architecture and the DARE TIES merging approach, this model is likely optimized for tasks where the specific characteristics of the merged components are beneficial. Its large context window makes it suitable for applications requiring extensive text processing and understanding.