Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_dare_ties_density0.2
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Jan 1, 2026Architecture:Transformer Warm

Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_dare_ties_density0.2 is a 1.5 billion parameter language model merged using the DARE TIES method, based on deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. This model combines two specialized checkpoints, 'len_MRL4096_ROLLOUT4_LR5e-7' and 'accfmt_MRL4096_ROLLOUT4_LR5e-7', with equal weighting and a density of 0.2. It features a substantial context length of 131072 tokens, making it suitable for tasks requiring extensive contextual understanding and processing.

Loading preview...

Model Overview

This model, developed by Zachary1150, is a 1.5 billion parameter language model created through a merge of pre-trained models using the DARE TIES method. It is built upon the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B as its base architecture.

Merge Details

The merge process combined two distinct checkpoints:

  • /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/len_MRL4096_ROLLOUT4_LR5e-7/global_step_30/actor/huggingface
  • /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface

Both models were assigned a weight of 0.5 and a density of 0.2 during the DARE TIES merge, which also included normalization. The model supports a context length of 131072 tokens and was processed using bfloat16 data type.

Potential Use Cases

Given its merged nature and substantial context window, this model is likely optimized for tasks that benefit from:

  • Extended context understanding: Processing and generating text over very long documents or conversations.
  • Specialized capabilities: The specific nature of the merged checkpoints ('len' and 'accfmt') suggests potential optimization for particular aspects of language processing, though further details on these specific optimizations are not provided in the README.