Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR5e-7_w0.7_linear

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Dec 20, 2025Architecture:Transformer Warm

Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR5e-7_w0.7_linear is a 1.5 billion parameter language model created by Zachary1150 using a linear merge method. This model combines two pre-trained language models, specifically focusing on actor checkpoints from baselines_openrs, with a 0.7 weight applied to one component. It is designed for applications requiring a compact yet capable model derived from merged architectures, offering a substantial 131,072 token context length.

Loading preview...

Model Overview

This model, merge_accfmt_MRL4096_ROLLOUT4_LR5e-7_w0.7_linear, is a 1.5 billion parameter language model developed by Zachary1150. It was constructed using the MergeKit tool, specifically employing the Linear merge method to combine two distinct pre-trained language models. The merging process involved actor checkpoints from baselines_openrs/acc_MRL4096_ROLLOUT4_LR5e-7 and baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR5e-7, with a weighted average applied (0.7 for the first model, 0.3 for the second).

Key Characteristics

  • Architecture: Merged model using the Linear method.
  • Parameter Count: 1.5 billion parameters.
  • Context Length: Features a substantial 131,072 token context window.
  • Merging Configuration: Utilizes bfloat16 dtype and includes normalization during the merge.

Potential Use Cases

This model is suitable for developers and researchers looking for:

  • A compact language model derived from a weighted merge of specialized actor checkpoints.
  • Applications benefiting from a very large context window (131k tokens).
  • Experimentation with merged model architectures for specific tasks where the constituent models excel.