Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR1e-6_w0.5_dare_ties is a 1.5 billion parameter language model created by Zachary1150, based on deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. This model was developed using the DARE TIES merge method, combining two specialized pre-trained models. It features a substantial context length of 131072 tokens, making it suitable for tasks requiring extensive contextual understanding and processing.
Loading preview...
Model Overview
This model, merge_lenfmt_MRL4096_ROLLOUT4_LR1e-6_w0.5_dare_ties, is a 1.5 billion parameter language model developed by Zachary1150. It is built upon the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B base model and utilizes a substantial context length of 131072 tokens.
Key Characteristics
- Merge Method: Created using the DARE TIES merge method, which combines multiple pre-trained models to leverage their individual strengths.
- Base Model: Derived from
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B, indicating a foundation in a Qwen-based architecture. - Merged Components: Integrates two distinct models,
/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/len_MRL4096_ROLLOUT4_LR1e-6/global_step_50/actor/huggingfaceand/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR1e-6/global_step_50/actor/huggingface, each contributing with a weight of 0.5.
Potential Use Cases
Given its merged nature and large context window, this model is likely optimized for:
- Tasks requiring the synthesis of capabilities from its constituent models.
- Applications benefiting from processing and understanding very long texts or complex documents.
- Research into model merging techniques and their impact on performance.