Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR1e-6_w0.5_ties

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Dec 25, 2025Architecture:Transformer Warm

Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR1e-6_w0.5_ties is a 1.5 billion parameter language model created by Zachary1150. This model was produced by merging two pre-trained models using the TIES method, based on deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. It features a context length of 131072 tokens, making it suitable for tasks requiring extensive context processing.

Loading preview...

Model Overview

This model, merge_accfmt_MRL4096_ROLLOUT4_LR1e-6_w0.5_ties, is a 1.5 billion parameter language model developed by Zachary1150. It was created using the mergekit tool, specifically employing the TIES merge method.

Merge Details

The base model for this merge was deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. Two distinct pre-trained models were combined:

  • /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/acc_MRL4096_ROLLOUT4_LR1e-6/global_step_50/actor/huggingface
  • /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR1e-6/global_step_50/actor/huggingface

Each merged model contributed with a weight of 0.5 and a density of 0.5, with normalization applied during the merge process. The model was processed using bfloat16 data type.

Key Characteristics

  • Merge Method: Utilizes the TIES (Trimmed, Iterative, and Selective) merging approach, which is designed to combine the strengths of multiple models efficiently.
  • Base Model: Built upon the DeepSeek-R1-Distill-Qwen-1.5B architecture, inheriting its foundational capabilities.
  • Context Length: Features a notable context length of 131072 tokens, allowing for processing and understanding very long inputs.

Potential Use Cases

This model is suitable for applications that can benefit from a compact yet capable language model with a very large context window, especially in scenarios where the specific characteristics of the merged components are advantageous.