Zachary1150/merge_linear_cos0.9fmt0.1_MRL4096_ROLLOUT4_LR1e-6 is a 1.5 billion parameter language model created by Zachary1150 using the linear merge method. This model combines two base models, 'cos_MRL4096_ROLLOUT4_LR1e-6' and 'fmt_MRL4096_ROLLOUT4', with weights of 0.9 and 0.1 respectively. It is designed for general language understanding and generation tasks, leveraging the combined strengths of its constituent models.
Loading preview...
Model Overview
This model, Zachary1150/merge_linear_cos0.9fmt0.1_MRL4096_ROLLOUT4_LR1e-6, is a 1.5 billion parameter language model developed by Zachary1150. It was constructed using the Linear merge method via the mergekit tool, combining two distinct pre-trained language models.
Merge Details
The merge process involved two base models:
/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/cos_MRL4096_ROLLOUT4_LR1e-6/global_step_50/actor/huggingface/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/fmt_MRL4096_ROLLOUT4/global_step_50/actor/huggingface
These models were combined with specific weighting: the 'cos' model received a 0.9 weight, and the 'fmt' model received a 0.1 weight. The merge was configured to normalize parameters and utilize bfloat16 for its dtype.
Key Characteristics
As a product of a linear merge, this model aims to synthesize the capabilities of its parent models. The specific nature of the 'cos' and 'fmt' models (indicated by their paths) suggests an origin from research or experimental checkpoints, likely focusing on specific aspects of language modeling or reinforcement learning from human feedback (RLHF) given the 'actor' designation. Its 1.5B parameter count makes it suitable for applications requiring a balance between performance and computational efficiency.
Potential Use Cases
Given its merged nature, this model could be explored for general text generation, summarization, or question-answering tasks where the combined strengths of its base models are beneficial. Its relatively compact size allows for deployment in environments with moderate computational resources.