Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR1e-6_w0.5_dare_ties
Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR1e-6_w0.5_dare_ties is a 1.5 billion parameter language model merge, built upon the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B base model. This model was created using the DARE TIES merge method, combining two actor models from baselines_openrs checkpoints. It is designed for general language tasks, leveraging its merged architecture for potentially enhanced performance over its individual components.
Loading preview...
Model Overview
This model, merge_cosfmt_MRL4096_ROLLOUT4_LR1e-6_w0.5_dare_ties, is a 1.5 billion parameter language model created by Zachary1150. It is a merge of pre-trained language models, specifically utilizing the DARE TIES merge method, which is detailed in the DARE TIES paper.
Merge Details
The model's foundation is the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B base model. Two distinct actor models from baselines_openrs checkpoints were combined to form this merge:
/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/cos_MRL4096_ROLLOUT4_LR1e-6/global_step_50/actor/huggingface/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR1e-6/global_step_50/actor/huggingface
Each contributing model was assigned a weight of 0.5 and a density of 0.5 during the merging process, with normalization applied. The merge was performed using mergekit.
Potential Use Cases
Given its architecture as a merge of specialized actor models, this model is likely suitable for general language generation and understanding tasks, potentially benefiting from the combined strengths of its constituent models.