CultriX/Qwen2.5-14B-Brocav6
CultriX/Qwen2.5-14B-Brocav6 is a 14.8 billion parameter language model developed by CultriX, built upon the Qwen2.5 architecture. This model is a sophisticated merge of multiple specialized Qwen2.5-14B variants, utilizing the della_linear method to enhance performance across diverse benchmarks. It is particularly optimized for advanced reasoning, mathematical tasks, and instruction following, leveraging a 32768 token context length for complex problem-solving.
Loading preview...
CultriX/Qwen2.5-14B-Brocav6: A Merged Qwen2.5 Model
CultriX/Qwen2.5-14B-Brocav6 is a 14.8 billion parameter language model created by CultriX through a sophisticated merge of several pre-trained Qwen2.5-14B models. This merge was performed using the della_linear method via mergekit, with CultriX/Qwen2.5-14B-Wernickev3 serving as the base model.
Key Capabilities and Optimizations
The model's unique strength lies in its targeted optimization for a broad range of tasks, achieved by carefully blending models with specific proficiencies. The merge configuration prioritized performance across various benchmarks, including:
- Logical Reasoning: Enhanced for tasks like tinyArc and tinyWinogrande, leveraging contributions from models such as
CultriX/Qwen2.5-14B-Broca. - Mathematical Reasoning: Significantly boosted for
MATHbenchmarks, incorporatingqingy2019/Qwen2.5-Math-14B-Instruct. - Instruction Following: Prioritized for
IFEval, benefiting fromdjuna/Q2.5-Veltha-14B-0.5's strong performance. - Domain Knowledge & Multitask Performance: Improved for
tinyMMLUandMMLU-PROthrough models likeCultriX/Qwenfinity-2.5-14Bandallknowingroger/QwenSlerp6-14B. - Factual Reasoning & Contextual Understanding: Strengthened for
tinyTruthfulQAandtinyHellaswag. - Multi-step Reasoning: Enhanced for
MUSRandBBHtasks.
Merge Details
The della_linear merge method was configured with specific epsilon, lambda, and normalize parameters to ensure precise blending. Adaptive merge parameters were used, assigning higher task weights to critical areas like MATH (2.2), IFEval (2.0), and tinyTruthfulQA (1.95) to maximize their respective contributions. Gradient clipping was also applied to individual merged models to balance their influence effectively.