CultriX/Qwen2.5-14B-Brocav6

TEXT GENERATIONConcurrency Cost:1Model Size:14.8BQuant:FP8Ctx Length:32kPublished:Dec 23, 2024Architecture:Transformer0.0K Cold

CultriX/Qwen2.5-14B-Brocav6 is a 14.8 billion parameter language model developed by CultriX, built upon the Qwen2.5 architecture. This model is a sophisticated merge of multiple specialized Qwen2.5-14B variants, utilizing the della_linear method to enhance performance across diverse benchmarks. It is particularly optimized for advanced reasoning, mathematical tasks, and instruction following, leveraging a 32768 token context length for complex problem-solving.

Loading preview...

CultriX/Qwen2.5-14B-Brocav6: A Merged Qwen2.5 Model

CultriX/Qwen2.5-14B-Brocav6 is a 14.8 billion parameter language model created by CultriX through a sophisticated merge of several pre-trained Qwen2.5-14B models. This merge was performed using the della_linear method via mergekit, with CultriX/Qwen2.5-14B-Wernickev3 serving as the base model.

Key Capabilities and Optimizations

The model's unique strength lies in its targeted optimization for a broad range of tasks, achieved by carefully blending models with specific proficiencies. The merge configuration prioritized performance across various benchmarks, including:

  • Logical Reasoning: Enhanced for tasks like tinyArc and tinyWinogrande, leveraging contributions from models such as CultriX/Qwen2.5-14B-Broca.
  • Mathematical Reasoning: Significantly boosted for MATH benchmarks, incorporating qingy2019/Qwen2.5-Math-14B-Instruct.
  • Instruction Following: Prioritized for IFEval, benefiting from djuna/Q2.5-Veltha-14B-0.5's strong performance.
  • Domain Knowledge & Multitask Performance: Improved for tinyMMLU and MMLU-PRO through models like CultriX/Qwenfinity-2.5-14B and allknowingroger/QwenSlerp6-14B.
  • Factual Reasoning & Contextual Understanding: Strengthened for tinyTruthfulQA and tinyHellaswag.
  • Multi-step Reasoning: Enhanced for MUSR and BBH tasks.

Merge Details

The della_linear merge method was configured with specific epsilon, lambda, and normalize parameters to ensure precise blending. Adaptive merge parameters were used, assigning higher task weights to critical areas like MATH (2.2), IFEval (2.0), and tinyTruthfulQA (1.95) to maximize their respective contributions. Gradient clipping was also applied to individual merged models to balance their influence effectively.