allknowingroger/QwenSlerp6-14B

TEXT GENERATIONConcurrency Cost:1Model Size:14.8BQuant:FP8Ctx Length:32kPublished:Nov 28, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

allknowingroger/QwenSlerp6-14B is a 14.8 billion parameter language model created by allknowingroger, merged using the SLERP method from CultriX/SeQwence-14Bv1 and allknowingroger/Qwenslerp2-14B. This model leverages a V-shaped curve configuration during merging, optimizing for specific performance characteristics across its layers. It is designed for general language tasks, with a notable average performance of 39.02 on the Open LLM Leaderboard.

Loading preview...

Model Overview

allknowingroger/QwenSlerp6-14B is a 14.8 billion parameter language model developed by allknowingroger, utilizing a SLERP (Spherical Linear Interpolation) merge method. This model combines the strengths of two base models: CultriX/SeQwence-14Bv1 and allknowingroger/Qwenslerp2-14B.

Key Characteristics

  • Merge Method: Employs the SLERP technique for combining pre-trained models, allowing for a nuanced blend of their capabilities.
  • Configuration: The merge uses a specific V-shaped curve for parameter interpolation, suggesting a tailored approach to layer-wise integration (e.g., potentially emphasizing different base models for input/output versus middle layers).
  • Context Length: Supports a substantial context window of 32768 tokens, enabling processing of longer inputs and generating more coherent, extended outputs.

Performance Highlights

Evaluated on the Open LLM Leaderboard, QwenSlerp6-14B demonstrates a balanced performance across various benchmarks:

  • Average Score: Achieves an average score of 39.02.
  • IFEval (0-Shot): Scores 68.67, indicating strong instruction following capabilities.
  • BBH (3-Shot): Reaches 47.59, showing proficiency in complex reasoning tasks.
  • MMLU-PRO (5-shot): Scores 48.95, reflecting general knowledge and problem-solving abilities.

Use Cases

This model is suitable for a broad range of applications requiring robust language understanding and generation, particularly where a balance of instruction following, reasoning, and general knowledge is beneficial. Its substantial context length makes it well-suited for tasks involving longer documents or conversations.