Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR5e-7_w0.1_linear
Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR5e-7_w0.1_linear is a 1.5 billion parameter language model created by Zachary1150 using the Linear merge method via mergekit. This model combines two pre-trained actor models, with a 90% weight given to 'accfmt_MRL4096_ROLLOUT4_LR5e-7' and 10% to 'acc_MRL4096_ROLLOUT4_LR5e-7'. It is designed for tasks benefiting from the combined strengths of its constituent models, offering a substantial 131072 token context length.
Loading preview...
Model Overview
This model, merge_accfmt_MRL4096_ROLLOUT4_LR5e-7_w0.1_linear, is a 1.5 billion parameter language model developed by Zachary1150. It was constructed using the Linear merge method provided by mergekit, combining the capabilities of two distinct pre-trained models.
Merge Details
The merge process specifically combined two actor models:
/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/acc_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface
Configuration and Weighting
The merge applied a specific weighting to its components:
- The model
accfmt_MRL4096_ROLLOUT4_LR5e-7received a weight of 0.9. - The model
acc_MRL4096_ROLLOUT4_LR5e-7received a weight of 0.1.
This configuration, along with normalization, was processed using bfloat16 data type. The resulting model benefits from a substantial 131072 token context length, making it suitable for applications requiring extensive contextual understanding.