Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR5e-7_w0.7_linear
Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR5e-7_w0.7_linear is a 1.5 billion parameter language model created by Zachary1150 using a linear merge method. This model combines two pre-trained language models, specifically focusing on actor checkpoints from baselines_openrs, with a 0.7 weight applied to one component. It is designed for applications requiring a compact yet capable model derived from merged architectures, offering a substantial 131,072 token context length.
Loading preview...
Model Overview
This model, merge_accfmt_MRL4096_ROLLOUT4_LR5e-7_w0.7_linear, is a 1.5 billion parameter language model developed by Zachary1150. It was constructed using the MergeKit tool, specifically employing the Linear merge method to combine two distinct pre-trained language models. The merging process involved actor checkpoints from baselines_openrs/acc_MRL4096_ROLLOUT4_LR5e-7 and baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR5e-7, with a weighted average applied (0.7 for the first model, 0.3 for the second).
Key Characteristics
- Architecture: Merged model using the Linear method.
- Parameter Count: 1.5 billion parameters.
- Context Length: Features a substantial 131,072 token context window.
- Merging Configuration: Utilizes
bfloat16dtype and includes normalization during the merge.
Potential Use Cases
This model is suitable for developers and researchers looking for:
- A compact language model derived from a weighted merge of specialized actor checkpoints.
- Applications benefiting from a very large context window (131k tokens).
- Experimentation with merged model architectures for specific tasks where the constituent models excel.