Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_ties_density0.2

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Jan 1, 2026Architecture:Transformer Warm

Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_ties_density0.2 is a 1.5 billion parameter language model merged using the TIES method, based on deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. This model combines two fine-tuned actors, 'cos_MRL4096_ROLLOUT4_LR5e-7' and 'accfmt_MRL4096_ROLLOUT4_LR5e-7', with a context length of 131072 tokens. It is designed for tasks benefiting from the combined strengths of its constituent models, leveraging a specific merging configuration for optimized performance.

Loading preview...

Model Overview

This model, merge_cosfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_ties_density0.2, is a 1.5 billion parameter language model created by Zachary1150. It was developed using the TIES (Trimming, Iterative Retraining, and Sparsity) merge method, which combines multiple pre-trained models into a single, more capable model. The base model for this merge is deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B, indicating its foundation in the Qwen architecture.

Merge Details

The merge process involved two specific fine-tuned models, both identified as 'actor' checkpoints from distinct training runs:

  • /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/cos_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface
  • /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface

Each of these models contributed with a weight of 0.5 and a density of 0.2 during the TIES merge, as configured in the provided YAML. This specific merging strategy aims to consolidate the strengths of the individual models while maintaining efficiency. The model supports a substantial context length of 131072 tokens.

Potential Use Cases

Given its origin as a merge of fine-tuned 'actor' models, this model is likely suitable for tasks where the combined capabilities of its constituent models are beneficial. Developers looking for a compact yet capable model derived from the DeepSeek-R1-Distill-Qwen-1.5B base, enhanced through a TIES merge, may find this model useful for various language generation and understanding tasks.