Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR2e-6_w0.9_linear

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Dec 24, 2025Architecture:Transformer Warm

Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR2e-6_w0.9_linear is a 1.5 billion parameter language model created by Zachary1150 using the Linear merge method. This model combines two pre-trained language models, specifically focusing on actor checkpoints from baselines_openrs. It is designed for tasks benefiting from a weighted merge of these specific base models, offering a unique blend of their learned representations. The model has a notable context length of 131072 tokens, making it suitable for processing extensive inputs.

Loading preview...

Model Overview

Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR2e-6_w0.9_linear is a 1.5 billion parameter language model developed by Zachary1150. It was constructed using the Linear merge method via mergekit, combining two distinct pre-trained actor checkpoints from the baselines_openrs series. This approach allows for a weighted integration of their respective strengths.

Key Characteristics

  • Merge Method: Utilizes the Linear merge method, which combines the weights of multiple base models in a specified proportion.
  • Base Models: Merges two specific actor checkpoints: one from accfmt_MRL4096_ROLLOUT4_LR2e-6 and another from acc_MRL4096_ROLLOUT4_LR2e-6.
  • Weighting: The merge configuration assigns a weight of 0.9 to the acc_MRL4096_ROLLOUT4_LR2e-6 model and 0.1 to the accfmt_MRL4096_ROLLOUT4_LR2e-6 model, indicating a stronger emphasis on the former's characteristics.
  • Data Type: The merge was performed using bfloat16 precision.
  • Context Length: Features an extended context window of 131072 tokens, enabling the processing of very long sequences.

Use Cases

This model is particularly suited for applications where a specific blend of the capabilities of its constituent base models is desired. Its large context window makes it ideal for tasks requiring extensive input understanding or generation, such as:

  • Processing and generating long-form text.
  • Applications benefiting from the combined knowledge of the merged actor checkpoints.
  • Research into the effects of linear model merging with specific weighting schemes.