Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_dare_ties
Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_dare_ties is a 1.5 billion parameter language model created by Zachary1150. This model is a merge of two pre-trained models, based on deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B, utilizing the DARE TIES merge method. It features a substantial context length of 131072 tokens, making it suitable for tasks requiring extensive contextual understanding. The model is designed for general language generation and understanding, leveraging its merged architecture for potentially enhanced performance.
Loading preview...
Overview
This model, Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_dare_ties, is a 1.5 billion parameter language model developed by Zachary1150. It was created using the DARE TIES merge method, as described in the paper DARE TIES, and is built upon the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B as its base model.
Merge Details
The model integrates two distinct pre-trained language models, specifically:
/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/acc_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface
Each merged model contributed with a weight of 0.5 and a density of 0.5, with the overall merge process normalizing the parameters. This configuration aims to combine the strengths of the constituent models, potentially leading to improved performance in various language tasks. The model supports a context length of 131072 tokens.