Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR5e-7_w0.9_linear
Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR5e-7_w0.9_linear is a 1.5 billion parameter language model created by Zachary1150, formed by merging two pre-trained models using the Linear merge method. This model integrates specific checkpoints from 'acc_MRL4096_ROLLOUT4_LR5e-7' and 'accfmt_MRL4096_ROLLOUT4_LR5e-7' with a 0.9 and 0.1 weight respectively. It is designed to combine the strengths of its constituent models, offering a compact yet capable solution for tasks benefiting from merged model architectures.
Loading preview...
Model Overview
Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR5e-7_w0.9_linear is a 1.5 billion parameter language model developed by Zachary1150. This model was constructed using the Linear merge method via mergekit, combining two specific pre-trained checkpoints.
Merge Details
This model is a blend of two base models:
/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/acc_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface(weighted at 0.9)/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface(weighted at 0.1)
The merge process utilized a linear combination with normalization and bfloat16 data type, aiming to consolidate the learned representations from both source models into a single, efficient model.
Potential Use Cases
Given its architecture as a merged model, it is suitable for applications where combining the specific capabilities or knowledge domains of its constituent models is beneficial. Its 1.5 billion parameters make it a relatively compact model, potentially offering a good balance between performance and computational efficiency for various language-based tasks.