hotmailuser/QwenSlerp2-14B

TEXT GENERATIONConcurrency Cost:1Model Size:14.8BQuant:FP8Ctx Length:32kPublished:Jan 5, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The hotmailuser/QwenSlerp2-14B is a 14.8 billion parameter language model created by hotmailuser through a SLERP merge of sometimesanotion/Lamarck-14B-v0.6 and bamec66557/Qwen-2.5-14B-MINUS. This model leverages the strengths of its constituent models, with a context length of 32768 tokens, making it suitable for tasks requiring robust language understanding and generation. Its unique merging configuration, which applies a V-shaped curve to parameters, suggests a specialized balance between the base models for different layers.

Loading preview...

Model Overview

The hotmailuser/QwenSlerp2-14B is a 14.8 billion parameter language model developed by hotmailuser. It was created using the SLERP (Spherical Linear Interpolation) merge method, combining two distinct pre-trained models: sometimesanotion/Lamarck-14B-v0.6 and bamec66557/Qwen-2.5-14B-MINUS. This merging technique aims to blend the capabilities of the source models to achieve a balanced or enhanced performance profile.

Merge Details

The model's architecture is based on a specific configuration that applies a V-shaped curve to the parameters during the SLERP merge. This means different layers of the model are weighted differently towards the constituent models. Specifically, the configuration indicates a focus on "Hermes for input & output" and "WizardMath in the middle layers," suggesting an optimization strategy for specific types of tasks or processing stages within the model.

Key Characteristics

  • Parameter Count: 14.8 billion parameters.
  • Context Length: Supports a substantial context window of 32768 tokens.
  • Merge Method: Utilizes the SLERP method for combining models, allowing for nuanced blending of features.
  • Constituent Models: Merges sometimesanotion/Lamarck-14B-v0.6 and bamec66557/Qwen-2.5-14B-MINUS.

Potential Use Cases

Given its merged nature and specific parameter weighting, this model could be particularly effective for:

  • General language generation and understanding: Benefiting from the combined strengths of its base models.
  • Tasks requiring balanced performance: Where a blend of different model characteristics is desired rather than a single dominant one.
  • Exploratory research: For developers interested in the effects of advanced merging techniques on model performance.