KaraKaraWitch/Matsutei-Qwen2.5-72b
Hugging Face
TEXT GENERATIONConcurrency Cost:4Model Size:72.7BQuant:FP8Ctx Length:32kPublished:Nov 16, 2024Architecture:Transformer0.0K Warm

KaraKaraWitch/Matsutei-Qwen2.5-72b is a 72.7 billion parameter language model based on the Qwen2.5 architecture, created by KaraKaraWitch through a TIES merge of several pre-trained models. This model is designed to address specific issues related to world book lore information injection, aiming to reduce confusion when integrating external context. With a substantial 131,072 token context length, it is optimized for handling extensive narrative and contextual data. Its primary strength lies in managing complex lore and preventing initial confusion during content generation.

Loading preview...

Matsutei-Qwen2.5-72b Overview

KaraKaraWitch/Matsutei-Qwen2.5-72b is a 72.7 billion parameter language model built upon the Qwen2.5 architecture. It was developed by KaraKaraWitch using the TIES merge method, with EVA-UNIT-01/EVA-Qwen2.5-72B-v0.1 serving as its base model. The merge incorporates several other significant models, including KaraKaraWitch/SteyrCannon-Qwen2.5-72b, ZeusLabs/Chronos-Platinum-72B, and EVAm8than/banana-2-b-72b.

Key Capabilities

  • Enhanced Lore Management: Specifically designed to mitigate "weird vibe issues" and confusion when injecting world book lore information, a problem observed in previous merges like SteyrCannon-Qwen2.5-72b.
  • Large Context Window: Features a substantial 131,072 token context length, enabling it to process and generate extensive and detailed narratives or contextual data.
  • Merged Architecture: Benefits from the combined strengths of multiple large language models, potentially offering a more robust and nuanced understanding of complex prompts.

Good For

  • Applications requiring consistent and accurate integration of detailed world-building and lore.
  • Scenarios where large amounts of contextual information need to be processed without generating confused or contradictory outputs.
  • Use cases demanding a high-parameter model with a focus on narrative coherence and contextual awareness.
Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p