Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR2e-6_w0.7_linear

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Dec 24, 2025Architecture:Transformer Warm

Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR2e-6_w0.7_linear is a 1.5 billion parameter language model created by Zachary1150. This model is a merge of two pre-trained language models, utilizing the Linear merge method with specific weighting (0.7 and 0.3). It is designed for general language tasks, leveraging the combined strengths of its constituent models.

Loading preview...

Model Overview

This model, merge_cosfmt_MRL4096_ROLLOUT4_LR2e-6_w0.7_linear, is a 1.5 billion parameter language model developed by Zachary1150. It was constructed using the MergeKit tool, specifically employing the Linear merge method.

Merge Details

The model is a composite of two distinct pre-trained language models. The merging process assigned a weight of 0.7 to one base model and 0.3 to the other, aiming to combine their respective capabilities. The merge configuration also specified normalize: true and dtype: bfloat16 for the resulting model.

Key Characteristics

  • Parameter Count: 1.5 billion parameters.
  • Context Length: Supports a substantial context length of 131,072 tokens.
  • Merge Method: Utilizes the Linear merge method for combining model weights.
  • Origin: Created by merging two specific base models, likely contributing to a balanced performance across various language understanding and generation tasks.

Potential Use Cases

Given its architecture as a merged model, it is suitable for a broad range of natural language processing applications where a balance of performance and efficiency is desired. Its large context window makes it potentially useful for tasks requiring extensive contextual understanding.