allknowingroger/QwenSlerp6-14B
allknowingroger/QwenSlerp6-14B is a 14.8 billion parameter language model created by allknowingroger, merged using the SLERP method from CultriX/SeQwence-14Bv1 and allknowingroger/Qwenslerp2-14B. This model leverages a V-shaped curve configuration during merging, optimizing for specific performance characteristics across its layers. It is designed for general language tasks, with a notable average performance of 39.02 on the Open LLM Leaderboard.
Loading preview...
Model Overview
allknowingroger/QwenSlerp6-14B is a 14.8 billion parameter language model developed by allknowingroger, utilizing a SLERP (Spherical Linear Interpolation) merge method. This model combines the strengths of two base models: CultriX/SeQwence-14Bv1 and allknowingroger/Qwenslerp2-14B.
Key Characteristics
- Merge Method: Employs the SLERP technique for combining pre-trained models, allowing for a nuanced blend of their capabilities.
- Configuration: The merge uses a specific V-shaped curve for parameter interpolation, suggesting a tailored approach to layer-wise integration (e.g., potentially emphasizing different base models for input/output versus middle layers).
- Context Length: Supports a substantial context window of 32768 tokens, enabling processing of longer inputs and generating more coherent, extended outputs.
Performance Highlights
Evaluated on the Open LLM Leaderboard, QwenSlerp6-14B demonstrates a balanced performance across various benchmarks:
- Average Score: Achieves an average score of 39.02.
- IFEval (0-Shot): Scores 68.67, indicating strong instruction following capabilities.
- BBH (3-Shot): Reaches 47.59, showing proficiency in complex reasoning tasks.
- MMLU-PRO (5-shot): Scores 48.95, reflecting general knowledge and problem-solving abilities.
Use Cases
This model is suitable for a broad range of applications requiring robust language understanding and generation, particularly where a balance of instruction following, reasoning, and general knowledge is beneficial. Its substantial context length makes it well-suited for tasks involving longer documents or conversations.