Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_ties

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Dec 25, 2025Architecture:Transformer Warm

Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_ties is a 1.5 billion parameter language model created by Zachary1150, merged from pre-trained models using the TIES method. It is based on deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B and features a substantial context length of 131072 tokens. This model is specifically designed for applications benefiting from merged model architectures, leveraging the strengths of its constituent components.

Loading preview...

Model Overview

Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_ties is a 1.5 billion parameter language model developed by Zachary1150. It was constructed using the TIES (Trimming, Iterative Merging, and Self-Distillation) merge method, which combines multiple pre-trained models into a single, more capable model. The base model for this merge is deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B, indicating a foundation rooted in the Qwen architecture.

Merge Details

This model integrates two specific actor models, each contributing with a weight of 0.5 and a density of 0.5, as defined by the TIES configuration. The merging process aimed to consolidate the capabilities of these individual components, potentially enhancing overall performance or specializing in certain tasks. The model supports a very large context window of 131072 tokens, making it suitable for processing extensive inputs or generating long-form content.

Potential Use Cases

  • Applications requiring a model with a very long context window.
  • Scenarios where combining the strengths of multiple specialized models is beneficial.
  • Research into model merging techniques and their practical applications.