Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_dare_ties
The Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_dare_ties model is a 1.5 billion parameter language model merged from DeepSeek-R1-Distill-Qwen-1.5B using the DARE TIES method. It integrates capabilities from two specialized base models, focusing on enhanced performance within a 131072 token context length. This model is designed for applications requiring a compact yet capable language model with merged expertise.
Loading preview...
Model Overview
This model, merge_lenfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_dare_ties, is a 1.5 billion parameter language model created by Zachary1150. It was developed using the DARE TIES merge method, a technique designed to combine the strengths of multiple pre-trained models efficiently. The base model for this merge is deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B.
Merge Details
The model integrates two distinct components, each contributing 0.5 weight and density, indicating a balanced combination of their learned representations. The specific components merged were:
- A model from
/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/len_MRL4096_ROLLOUT4_LR5e-7/global_step_30/actor/huggingface - A model from
/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface
This approach aims to leverage specialized capabilities from each source model into a single, more versatile model. The model supports a substantial context length of 131072 tokens and is configured to use bfloat16 precision.
Key Characteristics
- Architecture: Merged from DeepSeek-R1-Distill-Qwen-1.5B base.
- Parameter Count: 1.5 billion parameters, offering a balance between performance and computational efficiency.
- Context Length: Supports an extended context of 131072 tokens, suitable for tasks requiring processing long inputs.
- Merge Method: Utilizes the DARE TIES method, known for its effectiveness in combining models while preserving performance.
Potential Use Cases
This model is suitable for applications where a compact, yet capable language model with merged expertise is beneficial, particularly those that can leverage its extended context window.