Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR2e-6_w0.1_linear

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Dec 24, 2025Architecture:Transformer Warm

Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR2e-6_w0.1_linear is a 1.5 billion parameter language model created by Zachary1150 using the Linear merge method. This model combines two pre-trained language models, one focused on length formatting and another on accuracy formatting, to potentially enhance performance in specific text generation tasks. With a context length of 131072 tokens, it is designed for applications requiring processing of very long sequences. Its primary strength lies in its merged architecture, aiming for a balanced output between length and accuracy considerations.

Loading preview...

Model Overview

Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR2e-6_w0.1_linear is a 1.5 billion parameter language model developed by Zachary1150. It was constructed using the Linear merge method via mergekit, combining two distinct pre-trained models.

Merge Details

This model is a blend of two base models, with specific weighting applied during the merge process:

  • Model 1 (Weight 0.1): /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/len_MRL4096_ROLLOUT4_LR2e-6/global_step_40/actor/huggingface
  • Model 2 (Weight 0.9): /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR2e-6/global_step_30/actor/huggingface

The merge configuration utilized a linear method with normalize: true and dtype: bfloat16. This specific weighting suggests an emphasis on the characteristics of the second model, likely related to accuracy formatting, while incorporating aspects of the first model, potentially related to length formatting.

Key Characteristics

  • Merged Architecture: Combines two specialized base models to achieve a hybrid performance profile.
  • Linear Merge Method: Utilizes a straightforward, weighted averaging approach for model merging.
  • Parameter Count: A compact 1.5 billion parameters, making it suitable for applications where efficiency is a concern.
  • Extended Context Length: Features a substantial context window of 131072 tokens, enabling it to process and generate very long texts.