Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_dare_ties
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Dec 25, 2025Architecture:Transformer Warm

Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_dare_ties is a 1.5 billion parameter language model created by Zachary1150 using the DARE TIES merge method. It is based on deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B and combines two specialized models. This model is designed for general language understanding and generation tasks, leveraging its merged architecture for potentially enhanced performance.

Loading preview...

Model Overview

Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_dare_ties is a 1.5 billion parameter language model developed by Zachary1150. It was created using the DARE TIES merge method, a technique designed to combine multiple pre-trained language models effectively. The base model for this merge is deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B, providing a strong foundation for its capabilities.

Merge Details

This model integrates two distinct pre-trained models, both with specific configurations, into a unified architecture. The merge process utilized a weight of 0.5 and a density of 0.5 for each contributing model, ensuring a balanced combination. The DARE TIES method, as described in its associated research, aims to improve performance by selectively merging parameters.

Key Characteristics

  • Architecture: Merged model based on deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B.
  • Parameter Count: 1.5 billion parameters.
  • Merge Method: DARE TIES, known for its parameter merging efficiency.
  • Context Length: Supports a context length of 131072 tokens.

Potential Use Cases

This model is suitable for a variety of natural language processing tasks, including text generation, summarization, and question answering, benefiting from the combined strengths of its constituent models.