Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_ties

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Dec 25, 2025Architecture:Transformer Warm

Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_ties is a 1.5 billion parameter language model created by Zachary1150, merged using the TIES method. It is based on deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B and combines two fine-tuned models. This model is designed for tasks requiring a compact yet capable language model, leveraging its merged architecture for potentially enhanced performance in specific domains.

Loading preview...

Model Overview

Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_ties is a 1.5 billion parameter language model developed by Zachary1150. It was constructed using the TIES merge method from mergekit, building upon the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B as its base model.

Merge Details

This model is a composite of two distinct fine-tuned models, each contributing with a weight and density of 0.5. The merge configuration utilized a normalized approach with bfloat16 data type. The specific models merged were:

  • /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/len_MRL4096_ROLLOUT4_LR5e-7/global_step_30/actor/huggingface
  • /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface

Potential Use Cases

Given its architecture as a merged model, it is likely optimized for:

  • Resource-constrained environments: Its 1.5B parameter count makes it suitable for deployment where computational resources are limited.
  • Specific domain tasks: The underlying merged models suggest potential specialization, making it a candidate for tasks aligned with their original fine-tuning objectives.
  • Experimentation with merged architectures: Developers interested in exploring the performance characteristics of TIES-merged models based on DeepSeek's distilled Qwen architecture.