Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR2e-6_w0.3_linear
Zachary1150's merge_lenfmt_MRL4096_ROLLOUT4_LR2e-6_w0.3_linear is a 1.5 billion parameter language model created by merging two pre-trained models using the Linear merge method. This model combines specific checkpoints from "len_MRL4096_ROLLOUT4_LR2e-6" and "accfmt_MRL4096_ROLLOUT4_LR2e-6" with a 0.3 and 0.7 weight respectively. It is designed for tasks benefiting from the combined strengths of its constituent models, offering a 131072 token context length.
Loading preview...
Overview
This model, merge_lenfmt_MRL4096_ROLLOUT4_LR2e-6_w0.3_linear, is a 1.5 billion parameter language model developed by Zachary1150. It was constructed using the mergekit tool, specifically employing the Linear merge method.
Merge Details
The model is a composite of two distinct pre-trained language model checkpoints:
- A model from
/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/len_MRL4096_ROLLOUT4_LR2e-6/global_step_40/actor/huggingface - A model from
/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR2e-6/global_step_30/actor/huggingface
These models were combined with specific weighting: the first model received a 0.3 weight, and the second model received a 0.7 weight. The merge process also included normalization and utilized bfloat16 for its dtype.
Key Characteristics
- Architecture: Merged model using the Linear method.
- Parameter Count: 1.5 billion parameters.
- Context Length: Supports a substantial 131072 tokens.
Potential Use Cases
This model is suitable for applications that can leverage the combined capabilities of its constituent models, particularly in scenarios where the specific characteristics of "len_MRL4096_ROLLOUT4_LR2e-6" and "accfmt_MRL4096_ROLLOUT4_LR2e-6" are beneficial. Its large context window makes it potentially useful for tasks requiring extensive input understanding or generation.