Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_dare_ties

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Dec 25, 2025Architecture:Transformer Warm

Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_dare_ties is a 1.5 billion parameter language model created by Zachary1150. This model is a merge of two pre-trained models, based on deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B, utilizing the DARE TIES merge method. It features a substantial context length of 131072 tokens, making it suitable for tasks requiring extensive contextual understanding. The model is designed for general language generation and understanding, leveraging its merged architecture for potentially enhanced performance.

Loading preview...

Overview

This model, Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_dare_ties, is a 1.5 billion parameter language model developed by Zachary1150. It was created using the DARE TIES merge method, as described in the paper DARE TIES, and is built upon the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B as its base model.

Merge Details

The model integrates two distinct pre-trained language models, specifically:

  • /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/acc_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface
  • /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface

Each merged model contributed with a weight of 0.5 and a density of 0.5, with the overall merge process normalizing the parameters. This configuration aims to combine the strengths of the constituent models, potentially leading to improved performance in various language tasks. The model supports a context length of 131072 tokens.