CultriX/Qwen2.5-14B-Wernickev3

TEXT GENERATIONConcurrency Cost:1Model Size:14.8BQuant:FP8Ctx Length:32kPublished:Dec 19, 2024Architecture:Transformer0.0K Cold

CultriX/Qwen2.5-14B-Wernickev3 is a 14.8 billion parameter language model based on the Qwen2.5 architecture, created through a DARE TIES merge of five specialized Qwen-based models. Utilizing a 32K context length, this model integrates diverse strengths from its components, including enhanced reasoning, factual knowledge, and domain expertise. It is designed for applications requiring a robust, multi-faceted language understanding and generation capability derived from a blend of high-performing base models.

Loading preview...

Overview

CultriX/Qwen2.5-14B-Wernickev3 is a 14.8 billion parameter language model built upon the Qwen2.5-14B base. This model was created using the DARE TIES merge method, combining the strengths of five distinct Qwen-based models: allknowingroger/QwenSlerp6-14B, allknowingroger/QwenStock3-14B, CultriX/SeQwence-14B-EvolMerge, CultriX/Qwen2.5-14B-Wernicke, and VAGOsolutions/SauerkrautLM-v2-14b-DPO. The merge process was configured to prioritize specific capabilities from each component model, such as reasoning, factual baseline, mathematical performance, and domain knowledge, resulting in a highly capable and versatile model.

Key Capabilities

  • Enhanced Reasoning: Integrates robust reasoning capabilities from models like QwenSlerp6-14B.
  • Factual Knowledge: Benefits from components like SauerkrautLM-v2-14b-DPO for a strong factual baseline.
  • Domain Expertise: Incorporates MMLU-PRO enhancements from QwenStock3-14B for diverse subject expertise.
  • Question Answering: Leverages GPQA performance from Qwen2.5-14B-Wernicke.
  • Broad Task Performance: Designed to handle a wide array of tasks by blending the strengths of its constituent models.

Good For

  • Applications requiring a balanced model with strong performance across reasoning, factual recall, and diverse knowledge domains.
  • Use cases where a merged model can offer a synergistic advantage over individual base models.
  • Developers looking for a Qwen2.5-based model with a broad set of integrated capabilities.