Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR1e-6_w0.5_dare_ties
Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR1e-6_w0.5_dare_ties is a 1.5 billion parameter language model created by Zachary1150, built upon the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B base model. This model was developed using the DARE TIES merge method, combining two specific pre-trained language models. It features a substantial context length of 131072 tokens, making it suitable for tasks requiring extensive contextual understanding.
Loading preview...
Model Overview
This model, merge_accfmt_MRL4096_ROLLOUT4_LR1e-6_w0.5_dare_ties, is a 1.5 billion parameter language model developed by Zachary1150. It was constructed using mergekit and specifically employs the DARE TIES merge method, as detailed in the DARE TIES paper.
Merge Details
The base model for this merge was deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. Two distinct pre-trained language models were combined to create this merged model:
/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/acc_MRL4096_ROLLOUT4_LR1e-6/global_step_50/actor/huggingface/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR1e-6/global_step_50/actor/huggingface
Each of these constituent models was assigned a weight of 0.5 and a density of 0.5 during the merging process. The configuration also specified normalize: true and dtype: bfloat16.
Key Characteristics
- Parameter Count: 1.5 billion parameters.
- Context Length: Supports a context window of 131072 tokens.
- Merge Method: Utilizes the DARE TIES method for combining model weights, which is designed to improve performance by selectively merging parameters.
Potential Use Cases
Given its architecture and the DARE TIES merging approach, this model is likely optimized for tasks where the specific characteristics of the merged components are beneficial. Its large context window makes it suitable for applications requiring extensive text processing and understanding.