Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR1e-6_w0.5_dare_ties
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Dec 25, 2025Architecture:Transformer Warm

Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR1e-6_w0.5_dare_ties is a 1.5 billion parameter language model created by Zachary1150, based on deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. This model was developed using the DARE TIES merge method, combining two specialized pre-trained models. It features a substantial context length of 131072 tokens, making it suitable for tasks requiring extensive contextual understanding and processing.

Loading preview...

Model Overview

This model, merge_lenfmt_MRL4096_ROLLOUT4_LR1e-6_w0.5_dare_ties, is a 1.5 billion parameter language model developed by Zachary1150. It is built upon the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B base model and utilizes a substantial context length of 131072 tokens.

Key Characteristics

  • Merge Method: Created using the DARE TIES merge method, which combines multiple pre-trained models to leverage their individual strengths.
  • Base Model: Derived from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B, indicating a foundation in a Qwen-based architecture.
  • Merged Components: Integrates two distinct models, /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/len_MRL4096_ROLLOUT4_LR1e-6/global_step_50/actor/huggingface and /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR1e-6/global_step_50/actor/huggingface, each contributing with a weight of 0.5.

Potential Use Cases

Given its merged nature and large context window, this model is likely optimized for:

  • Tasks requiring the synthesis of capabilities from its constituent models.
  • Applications benefiting from processing and understanding very long texts or complex documents.
  • Research into model merging techniques and their impact on performance.