Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR2e-6_w0.7_linear is a 1.5 billion parameter language model created by Zachary1150 using a linear merge of two pre-trained models. This model is designed to combine the strengths of its constituent models, specifically focusing on length formatting and accuracy formatting. With a substantial context length of 131072 tokens, it is optimized for tasks requiring extensive contextual understanding and precise output structuring.
Loading preview...
Model Overview
This model, merge_lenfmt_MRL4096_ROLLOUT4_LR2e-6_w0.7_linear, is a 1.5 billion parameter language model developed by Zachary1150. It was constructed using the Linear merge method via mergekit, combining two distinct pre-trained models. The primary goal of this merge was to integrate and balance the capabilities of its base models, which are identified as accfmt_MRL4096_ROLLOUT4_LR2e-6 and len_MRL4096_ROLLOUT4_LR2e-6.
Merge Details
The merge configuration assigned a weight of 0.7 to the len_MRL4096_ROLLOUT4_LR2e-6 model and 0.3 to the accfmt_MRL4096_ROLLOUT4_LR2e-6 model, indicating a stronger emphasis on the characteristics of the former. The merging process also included normalization and utilized bfloat16 for its data type. This approach aims to leverage the specific strengths of each component model to create a more robust and versatile merged model.
Key Characteristics
- Parameter Count: 1.5 billion parameters.
- Context Length: Supports an extensive context window of 131072 tokens.
- Merge Method: Utilizes the Linear merge method to combine base models.
- Focus: Designed to integrate capabilities related to length and accuracy formatting from its constituent models.