Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR5e-7_w0.9_linear
The Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR5e-7_w0.9_linear model is a 1.5 billion parameter language model created by Zachary1150 using a linear merge method. It combines two pre-trained models, cos_MRL4096_ROLLOUT4_LR5e-7 and accfmt_MRL4096_ROLLOUT4_LR5e-7, with a 131072 token context length. This model is specifically designed through model merging to leverage the strengths of its constituent components for enhanced performance.
Loading preview...
Overview
This model, Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR5e-7_w0.9_linear, is a 1.5 billion parameter language model with a 131072 token context length. It was constructed by Zachary1150 using the Linear merge method via mergekit.
Merge Details
The model is a composite of two distinct pre-trained language models:
/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/cos_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface
These models were merged with specific weighting parameters: the first model received a weight of 0.9, and the second model received a weight of 0.1. The merging process was configured to normalize parameters and utilize bfloat16 data type, aiming to combine their respective capabilities into a single, more robust model.