Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_linear

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Dec 20, 2025Architecture:Transformer Warm

Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_linear is a 1.5 billion parameter language model created by Zachary1150 using the Linear merge method. This model combines two pre-trained base models, specifically "cos_MRL4096_ROLLOUT4_LR5e-7" and "accfmt_MRL4096_ROLLOUT4_LR5e-7", each contributing with a 0.5 weight. It features an extended context length of 131072 tokens, making it suitable for tasks requiring extensive contextual understanding.

Loading preview...

Model Overview

This model, merge_cosfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_linear, is a 1.5 billion parameter language model developed by Zachary1150. It was constructed using the Linear merge method via mergekit, combining two distinct pre-trained base models. The merge process assigned equal weighting (0.5) to each constituent model, specifically /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/cos_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface and /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface.

Key Characteristics

  • Architecture: A merged model, combining two base models using a linear interpolation approach.
  • Parameter Count: 1.5 billion parameters.
  • Context Length: Features a substantial context window of 131072 tokens.
  • Merge Method: Utilizes the Linear merge method, with a normalized bfloat16 dtype configuration.

Potential Use Cases

Given its merged nature and large context window, this model is likely intended for applications that benefit from the combined strengths of its constituent models and require processing extensive input sequences. Specific performance characteristics would depend on the capabilities of the original base models it integrates.