Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_ties
Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_ties is a 1.5 billion parameter language model created by Zachary1150, merged using the TIES method with deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B as its base. This model integrates two distinct pre-trained components, focusing on specific aspects of language understanding and generation. With a substantial context length of 131072 tokens, it is designed for tasks requiring extensive contextual awareness and nuanced linguistic processing.
Loading preview...
Model Overview
Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_ties is a 1.5 billion parameter language model developed by Zachary1150. It was constructed using the TIES merge method from mergekit, leveraging deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B as its foundational base model. This merging technique combines the strengths of multiple pre-trained models into a single, more capable entity.
Merge Details
The model integrates two specific pre-trained components, each contributing to its overall capabilities:
/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/len_MRL4096_ROLLOUT4_LR2e-6/global_step_40/actor/huggingface/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR2e-6/global_step_30/actor/huggingface
The merge configuration utilized a weight of 0.5 and a density of 0.5 for each component, with normalization enabled and bfloat16 as the data type. This precise merging strategy aims to balance the contributions of the constituent models.
Key Characteristics
- Architecture: Merged model based on
DeepSeek-R1-Distill-Qwen-1.5B. - Parameter Count: 1.5 billion parameters.
- Context Length: Supports an extensive context window of 131072 tokens, enabling processing of very long inputs and generating coherent, contextually relevant outputs over extended sequences.
Potential Use Cases
Given its large context window and merged nature, this model is suitable for applications requiring:
- Long-form content generation: Summarization, document analysis, or creative writing over extensive texts.
- Complex reasoning: Tasks that benefit from processing a broad range of information simultaneously.
- Specialized language tasks: Where the specific characteristics inherited from its merged components provide an advantage.