maxcurrent/Wiz2Beagle-7b-v1

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kLicense:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Wiz2Beagle-7b-v1 is a 7 billion parameter language model developed by maxcurrent, created by merging amazingvince/Not-WizardLM-2-7B and mlabonne/NeuralBeagle14-7B using the VortexMerge kit. This model leverages a ties-based merge method to combine the strengths of its constituent models, offering a unique blend of their capabilities. It is designed for general language tasks, benefiting from the diverse training data of its merged components. The model has a context length of 4096 tokens.

Loading preview...

Wiz2Beagle-7b-v1 Overview

Wiz2Beagle-7b-v1 is a 7 billion parameter language model developed by maxcurrent. This model is a product of merging two distinct base models: amazingvince/Not-WizardLM-2-7B and mlabonne/NeuralBeagle14-7B. The merge was executed using the VortexMerge kit, a tool designed for combining multiple models.

Key Characteristics

  • Merge Method: The model utilizes a ties merge method, which is a technique for combining the weights of multiple models. This approach aims to synthesize the strengths of the constituent models.
  • Base Models: It integrates amazingvince/Not-WizardLM-2-7B and mlabonne/NeuralBeagle14-7B, suggesting a blend of their respective training and fine-tuning characteristics.
  • Configuration: The merge configuration involved specific density and weight gradients for each base model, with amazingvince/Not-WizardLM-2-7B serving as the primary base model for the merge. Normalization and int8 masking were applied during the merging process.
  • Data Type: The model is configured to use float16 precision.

Potential Use Cases

Given its merged nature, Wiz2Beagle-7b-v1 is suitable for a variety of general-purpose language generation and understanding tasks. Its performance will reflect the combined capabilities and potential specializations of its merged components, making it a versatile option for developers seeking a model with a broad range of abilities derived from established architectures.