Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR1e-6_w0.5_dare_ties_density0.2
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Jan 1, 2026Architecture:Transformer Warm

Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR1e-6_w0.5_dare_ties_density0.2 is a 1.5 billion parameter language model created by Zachary1150. It is a merge of two pre-trained models, acc_MRL4096 and accfmt_MRL4096, using the DARE TIES method with deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B as its base. This model is designed for tasks benefiting from the combined strengths of its merged components, offering a 131072 token context length.

Loading preview...

Model Overview

This model, Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR1e-6_w0.5_dare_ties_density0.2, is a 1.5 billion parameter language model developed by Zachary1150. It was constructed using the DARE TIES merge method, which combines the weights of multiple pre-trained models. The base model for this merge is deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B.

Merge Details

The model integrates two distinct pre-trained components: acc_MRL4096 and accfmt_MRL4096. Each component contributed with a weight of 0.5 and a density of 0.2 during the merging process. This approach aims to leverage the specific capabilities or knowledge encoded within each source model, potentially leading to a more robust or specialized merged model. The DARE TIES method, as described in its associated research, is known for its ability to effectively combine models while mitigating issues like catastrophic forgetting.

Key Characteristics

  • Architecture: Merged model based on deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B.
  • Parameter Count: 1.5 billion parameters.
  • Context Length: Supports a substantial context window of 131072 tokens.
  • Merge Method: Utilizes the DARE TIES technique for combining model weights.

Potential Use Cases

Given its merged nature and the specific components involved, this model is likely suitable for applications where the combined expertise of the source models is beneficial. Its large context window also makes it applicable for tasks requiring extensive input understanding or generation.