Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_ties_density0.2

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Jan 1, 2026Architecture:Transformer Warm

Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_ties_density0.2 is a 1.5 billion parameter language model merged using the TIES method, based on deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. This model combines two fine-tuned actors, acc_MRL4096_ROLLOUT4_LR2e-6 and accfmt_MRL4096_ROLLOUT4_LR2e-6, with a context length of 131072 tokens. It is designed to leverage the strengths of its constituent models for enhanced performance in specific tasks related to its merged components.

Loading preview...

Model Overview

This model, merge_accfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_ties_density0.2, is a 1.5 billion parameter language model created by Zachary1150. It was developed using the TIES merge method from mergekit, combining multiple pre-trained language models into a single, optimized entity. The base model for this merge is deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B, providing a strong foundation for the merged architecture.

Key Characteristics

  • Merge Method: Utilizes the TIES (Trimming, Iterative, and Selective) merging technique, which is designed to efficiently combine the knowledge from different models.
  • Base Model: Built upon deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B, inheriting its foundational capabilities.
  • Constituent Models: The merge incorporates two specific actor models: acc_MRL4096_ROLLOUT4_LR2e-6 and accfmt_MRL4096_ROLLOUT4_LR2e-6, each contributing with a weight of 0.5 and a density of 0.2.
  • Context Length: Supports a substantial context window of 131072 tokens, allowing for processing and generating longer sequences of text.

Potential Use Cases

Given its merge-based construction from specialized actor models, this model is likely suitable for applications that benefit from the combined expertise of its components. Developers interested in exploring the effects of TIES merging on specific fine-tuned models, particularly those derived from the DeepSeek-R1-Distill-Qwen-1.5B family, may find this model valuable for research and development.