Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_dare_ties_density0.2
Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_dare_ties_density0.2 is a 1.5 billion parameter language model created by Zachary1150, formed by merging pre-trained models using the DARE TIES method. It utilizes deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B as its base architecture. This model integrates specific checkpoints from baselines_openrs, aiming to combine their capabilities. Its primary characteristic is being a merged model, leveraging the strengths of its constituent components.
Loading preview...
Model Overview
This model, developed by Zachary1150, is a 1.5 billion parameter language model created through a merging process. It is built upon the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B base model.
Merge Details
The model was constructed using the DARE TIES (Disentangled Attribute Representation and Embedding for Text) merge method, as described in the paper https://arxiv.org/abs/2311.03099. This technique combines the weights of multiple pre-trained models to create a new, unified model.
Constituent Models
Two specific checkpoints from baselines_openrs were merged to form this model:
/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/cos_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface
Each constituent model contributed with a weight of 0.5 and a density of 0.2 during the merge process, with normalization applied. The merging was performed using mergekit.