Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_ties

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Dec 25, 2025Architecture:Transformer Warm

Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_ties is a 1.5 billion parameter language model merged using the TIES method, based on deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. This model combines two actor checkpoints, acc_MRL4096_ROLLOUT4_LR5e-7 and accfmt_MRL4096_ROLLOUT4_LR5e-7, each contributing with a weight and density of 0.5. It is designed for tasks benefiting from merged model capabilities, leveraging its 131072 token context length.

Loading preview...

Model Overview

Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_ties is a 1.5 billion parameter language model created by Zachary1150. It was developed by merging pre-trained language models using the TIES merge method, which is detailed in the TIES paper.

Merge Details

This model is built upon the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B as its base model. The merge process combined two specific actor checkpoints:

  • /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/acc_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface
  • /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface

Each of these models contributed with a weight of 0.5 and a density of 0.5 during the TIES merge. The configuration also specified normalize: true and dtype: bfloat16 for the merge. With a context length of 131072 tokens, this model is suitable for applications requiring processing of extensive inputs.