Biscotto58/MistralNemoMegaV1_rev

TEXT GENERATIONConcurrency Cost:1Model Size:12BQuant:FP8Ctx Length:32kPublished:Jan 10, 2026Architecture:Transformer0.0K Cold

Biscotto58/MistralNemoMegaV1_rev is a 12 billion parameter language model created by Biscotto58, developed through a DARE TIES merge of pre-trained models. This model integrates components from 'balanced_creative' and 'intelligence_fusion' to combine their respective strengths. It is designed for general language generation tasks, leveraging its merged architecture for balanced performance. The model has a context length of 32768 tokens.

Loading preview...

Model Overview

Biscotto58/MistralNemoMegaV1_rev is a 12 billion parameter language model developed by Biscotto58. It was created using the DARE TIES merge method, a technique designed to combine the strengths of multiple pre-trained language models. The merging process utilized mergekit and involved two primary components: an intermediate model referred to as balanced_creative as the base, and intelligence_fusion as the contributing model.

Merge Details

The model's architecture is a result of a specific configuration that weighted the contributions of its constituent models. The balanced_creative component had a density of 0.75 and a weight of 0.52, while intelligence_fusion contributed with a density of 0.7 and a weight of 0.48. This precise merging strategy aims to synthesize their capabilities into a cohesive final model.

Key Characteristics

  • Parameter Count: 12 billion parameters.
  • Merge Method: Utilizes the DARE TIES method for combining model weights.
  • Base Model: balanced_creative served as the foundational model for the merge.
  • Contributing Model: intelligence_fusion was integrated to enhance the base model's capabilities.
  • Configuration: The merge process included int8_mask parameter set to true, indicating potential optimizations for integer-8 precision.

Potential Use Cases

Given its merged nature, this model is likely suitable for a range of general language generation tasks where a blend of creative and intelligent reasoning capabilities is beneficial. Its 12B parameter size offers a balance between performance and computational efficiency for various applications.