Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_ties
Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_ties is a 1.5 billion parameter language model merged using the TIES method, based on deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. This model combines two distinct checkpoints, 'cos_MRL4096_ROLLOUT4_LR2e-6' and 'accfmt_MRL4096_ROLLOUT4_LR2e-6', each contributing with a weight of 0.5. It is designed to leverage the strengths of its constituent models for general language understanding and generation tasks, offering a compact yet capable solution.
Loading preview...
Model Overview
This model, merge_cosfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_ties, is a 1.5 billion parameter language model created by Zachary1150. It was developed using the TIES (Trimming, Iterative, and Selective) merge method, which combines multiple pre-trained models into a single, more capable model. The base model for this merge is deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B.
Merge Details
Two specific checkpoints were merged to create this model:
/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/cos_MRL4096_ROLLOUT4_LR2e-6/global_step_40/actor/huggingface/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR2e-6/global_step_30/actor/huggingface
Each of these constituent models contributed with a weight of 0.5 and a density of 0.5 during the merging process. The merge was configured to normalize parameters and was performed using bfloat16 data type. This approach aims to consolidate the learned representations from the individual models, potentially enhancing overall performance and generalization capabilities.