Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_dare_ties
Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_dare_ties is a 1.5 billion parameter language model merge, created using the DARE TIES method. It is based on deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B and combines two actor models, featuring a substantial context length of 131072 tokens. This model is designed for applications requiring a compact yet capable language model with an extended context window, leveraging model merging techniques for potentially enhanced performance.
Loading preview...
Model Overview
This model, merge_accfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_dare_ties, is a 1.5 billion parameter language model created by Zachary1150. It was developed using the mergekit tool, specifically employing the DARE TIES merge method, which is detailed in the paper DARE TIES.
Key Characteristics
- Base Model: The merge is built upon
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. - Merged Components: It integrates two distinct actor models, each contributing with a weight and density of 0.5, indicating a balanced combination.
- Merge Method: Utilizes the DARE TIES technique, known for its approach to merging models by pruning and re-scaling weights.
- Context Length: Features a notable context window of 131072 tokens, allowing for processing very long inputs.
Potential Use Cases
- Long-Context Applications: Ideal for tasks requiring extensive contextual understanding, such as summarizing long documents, code analysis, or complex conversational agents.
- Resource-Constrained Environments: As a 1.5B parameter model, it offers a balance of capability and efficiency, suitable for deployment where larger models are impractical.
- Experimental Merging: Useful for researchers and developers interested in exploring the effects of the DARE TIES merging strategy on specific base models and components.