allknowingroger/Qwenslerp4-14B

TEXT GENERATIONConcurrency Cost:1Model Size:14.8BQuant:FP8Ctx Length:32kPublished:Nov 27, 2024Architecture:Transformer0.0K Cold

allknowingroger/Qwenslerp4-14B is a 14.8 billion parameter language model based on Qwen/Qwen2.5-14B, created by allknowingroger using the DARE TIES merge method. This model integrates several specialized Qwen2.5-14B variants to enhance performance across various reasoning and factual understanding tasks. It is specifically optimized for benchmarks like MATH, MUSR, GPQA, and IFEval, making it suitable for complex problem-solving and knowledge-intensive applications.

Loading preview...

Overview

allknowingroger/Qwenslerp4-14B is a 14.8 billion parameter language model built upon the Qwen/Qwen2.5-14B base, developed by allknowingroger. It leverages the DARE TIES merge method to combine four distinct Qwen2.5-14B variants: CultriX/Qwen2.5-14B-Wernicke, VAGOsolutions/SauerkrautLM-v2-14b-DPO, rombodawg/Rombos-LLM-V2.6-Qwen-14b, and allknowingroger/Qwenslerp2-14B. This strategic merge aims to consolidate and enhance specific strengths from each component model.

Key Capabilities

  • Enhanced Reasoning: Prioritizes performance in reasoning-heavy tasks such as MATH and MUSR, with specific task weights applied during the merge.
  • Factual Recall & Understanding: Boosts accuracy in GPQA and maintains consistent knowledge representation in MMLU-PRO.
  • Instruction Following: Designed to maintain high IFEval performance, indicating strong adherence to instructions.
  • Optimized for Efficiency: Utilizes int8_mask and bfloat16 dtype for memory and compute efficiency, alongside normalize parameters for scale consistency.

Good for

  • Applications requiring strong mathematical and logical reasoning.
  • Tasks demanding high factual accuracy and general knowledge.
  • Use cases where robust instruction following is critical.
  • Developers seeking a merged model that balances conversational ability with specialized benchmark performance.