Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_linear is a 1.5 billion parameter language model created by Zachary1150 using the Linear merge method. This model combines two pre-trained actor models, acc_MRL4096 and accfmt_MRL4096, each contributing equally with a 0.5 weight. It is designed for tasks benefiting from the combined strengths of its constituent models, operating with a notable context length of 131072 tokens.
Loading preview...
Overview
This model, merge_accfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_linear, is a 1.5 billion parameter language model developed by Zachary1150. It was constructed using the Linear merge method via mergekit, combining two distinct pre-trained "actor" models. The merging process assigned an equal weight of 0.5 to each constituent model, specifically /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/acc_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface and /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface.
Key Characteristics
- Merge Method: Utilizes the Linear merge technique, which averages the weights of the merged models.
- Constituent Models: Formed from two specific actor models, suggesting a focus on agentic or action-oriented tasks.
- Configuration: Merged with normalized parameters and
bfloat16data type, indicating an optimization for efficiency and performance. - Context Length: Features a substantial context window of 131072 tokens, allowing for processing very long inputs.
Good For
This model is suitable for applications that can leverage the combined capabilities of its merged components, particularly in scenarios where a broad context window is beneficial. Its origin from "actor" models implies potential utility in reinforcement learning, agent-based systems, or tasks requiring decision-making and action generation.