Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_ties
Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_ties is a 1.5 billion parameter language model merged using the TIES method, based on deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. This model combines two actor checkpoints, acc_MRL4096_ROLLOUT4_LR5e-7 and accfmt_MRL4096_ROLLOUT4_LR5e-7, each contributing with a weight and density of 0.5. It is designed for tasks benefiting from merged model capabilities, leveraging its 131072 token context length.
Loading preview...
Model Overview
Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_ties is a 1.5 billion parameter language model created by Zachary1150. It was developed by merging pre-trained language models using the TIES merge method, which is detailed in the TIES paper.
Merge Details
This model is built upon the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B as its base model. The merge process combined two specific actor checkpoints:
/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/acc_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface
Each of these models contributed with a weight of 0.5 and a density of 0.5 during the TIES merge. The configuration also specified normalize: true and dtype: bfloat16 for the merge. With a context length of 131072 tokens, this model is suitable for applications requiring processing of extensive inputs.