Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR1e-6_w0.5_ties
Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR1e-6_w0.5_ties is a 1.5 billion parameter language model merged using the TIES method, based on deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. This model combines two distinct checkpoints, 'cos_MRL4096_ROLLOUT4_LR1e-6' and 'accfmt_MRL4096_ROLLOUT4_LR1e-6', each contributing 50% weight and density. It is designed for general language tasks, leveraging the strengths of its merged components to enhance performance.
Loading preview...
Model Overview
This model, merge_cosfmt_MRL4096_ROLLOUT4_LR1e-6_w0.5_ties, is a 1.5 billion parameter language model created by Zachary1150. It was developed using the TIES merge method from the mergekit framework, building upon the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B as its base model.
Merge Details
The model integrates two distinct pre-trained checkpoints, each contributing equally with a weight and density of 0.5. The merged components are:
/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/cos_MRL4096_ROLLOUT4_LR1e-6/global_step_50/actor/huggingface/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR1e-6/global_step_50/actor/huggingface
This merging strategy aims to combine the learned representations from these individual models into a unified, more capable language model. The merge process was configured to normalize parameters and utilize bfloat16 data type for efficiency.
Potential Use Cases
Given its foundation on a DeepSeek-R1-Distill-Qwen base and the TIES merging of two actor checkpoints, this model is likely suitable for a range of general-purpose language generation and understanding tasks where a 1.5B parameter model with a 131072 token context length is appropriate.