Zachary1150/merge_linear_len0.3fmt0.7_MRL4096_ROLLOUT4_LR1e-6
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kArchitecture:Transformer Warm

Zachary1150/merge_linear_len0.3fmt0.7_MRL4096_ROLLOUT4_LR1e-6 is a 1.5 billion parameter language model created by Zachary1150 using a linear merge method. This model combines two pre-trained language models, with one contributing 30% weight and the other 70%, to achieve a specific blend of their capabilities. It supports a substantial context length of 131072 tokens, making it suitable for tasks requiring extensive contextual understanding. The model is designed for applications that benefit from the combined strengths of its constituent models.

Loading preview...

Model Overview

Zachary1150/merge_linear_len0.3fmt0.7_MRL4096_ROLLOUT4_LR1e-6 is a 1.5 billion parameter language model developed by Zachary1150. It was constructed using the Linear merge method via MergeKit, combining two distinct pre-trained language models. This merging approach allows for a weighted combination of the source models' characteristics.

Merge Details

This model is a blend of two base models, specifically:

  • /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/len_MRL4096_ROLLOUT4_LR1e-6/global_step_50/actor/huggingface (contributing 30% weight)
  • /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/fmt_MRL4096_ROLLOUT4/global_step_50/actor/huggingface (contributing 70% weight)

The merge process utilized a bfloat16 data type and included normalization, as specified in the configuration. The model features a significant context length of 131072 tokens.

Potential Use Cases

Given its merged nature and substantial context window, this model is likely suitable for applications that require:

  • Extended context processing: Handling long documents, conversations, or codebases.
  • Specific task performance: Leveraging the combined strengths of its constituent models for particular language understanding or generation tasks, depending on what the 'len' and 'fmt' models were optimized for.