Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR5e-7_w0.3_linear
Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR5e-7_w0.3_linear is a 1.5 billion parameter language model created by Zachary1150 using the Linear merge method. This model combines two pre-trained components, 'cos_MRL4096_ROLLOUT4_LR5e-7' and 'accfmt_MRL4096_ROLLOUT4_LR5e-7', with specific weighting. It is designed to leverage the strengths of its constituent models for general language understanding and generation tasks, offering a substantial 131,072 token context length.
Loading preview...
Model Overview
This model, merge_cosfmt_MRL4096_ROLLOUT4_LR5e-7_w0.3_linear, is a 1.5 billion parameter language model developed by Zachary1150. It was constructed using the Linear merge method via the mergekit tool, combining the weights of two distinct pre-trained models.
Merge Details
The model integrates two base models:
/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/cos_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface
These models were merged with a specific weighting: the first model received a weight of 0.3, and the second model received a weight of 0.7. The merge process also included normalization and utilized bfloat16 for its data type. This approach aims to combine the learned representations and capabilities of the constituent models into a single, more robust language model, offering a notable 131,072 token context window.