Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_ties_density0.2

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Jan 1, 2026Architecture:Transformer Warm

Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_ties_density0.2 is a 1.5 billion parameter language model merged from two specialized models using the TIES method, based on deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. This model integrates distinct capabilities from its constituent components, likely focusing on enhanced performance in specific language understanding or generation tasks. Its architecture is designed for efficient deployment while maintaining a substantial context length of 131072 tokens, making it suitable for applications requiring extensive contextual awareness.

Loading preview...

Model Overview

This model, merge_lenfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_ties_density0.2, is a 1.5 billion parameter language model created by Zachary1150. It was developed using the TIES merge method (Task-Agnostic In-Context Example Selection) to combine the strengths of two distinct pre-trained models. The base model for this merge was deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B.

Merge Details

The merge process involved two specific models, each contributing with a weight of 0.5 and a density of 0.2, indicating a balanced integration of their features. The TIES method is known for its ability to efficiently combine models while preserving their individual expertise, suggesting this merge aims for a synergistic outcome rather than a simple average.

Key Characteristics

  • Architecture: Based on the DeepSeek-R1-Distill-Qwen-1.5B family.
  • Parameter Count: 1.5 billion parameters, offering a balance between performance and computational efficiency.
  • Context Length: Features a substantial context window of 131072 tokens, enabling processing of very long inputs and maintaining extensive conversational history or document understanding.
  • Merge Method: Utilizes the TIES method, which is designed to selectively merge parameters, potentially leading to specialized capabilities derived from its source models.

Potential Use Cases

Given its large context window and specialized merge approach, this model could be particularly effective for:

  • Applications requiring deep contextual understanding over long documents.
  • Tasks benefiting from the combined strengths of its merged components, such as enhanced reasoning or specific formatting adherence.
  • Scenarios where a 1.5B parameter model offers a good trade-off between performance and resource usage.