TheBloke/Manticore-13B-Chat-Pyg-SuperHOT-8K-fp16

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kLicense:otherArchitecture:Transformer0.0K Cold

TheBloke/Manticore-13B-Chat-Pyg-SuperHOT-8K-fp16 is a 13 billion parameter Llama-based model, fine-tuned by OpenAccess AI Collective and merged with Kaio Ken's SuperHOT 8K extension. This model is designed for chat-style interactions and features an extended context length of 8192 tokens, making it suitable for conversations requiring longer memory. It excels in roleplay and general chat applications, incorporating datasets like a de-duped Pygmalion subset and various instruction-augmented datasets.

Loading preview...

Overview

TheBloke/Manticore-13B-Chat-Pyg-SuperHOT-8K-fp16 is a 13 billion parameter model built upon the Llama architecture. It's a merge of OpenAccess AI Collective's Manticore 13B Chat and Kaio Ken's SuperHOT 8K context extension. This combination provides a model optimized for chat and conversational tasks with a significantly extended context window.

Key Capabilities

  • Extended Context Window: Achieves an 8192-token context length, enabling longer and more coherent conversations.
  • Chat-Optimized: Fine-tuned with chat-specific datasets, including a de-duped Pygmalion subset, and uses USER:, ASSISTANT:, <|system|>, <|user|>, and <|model|> prompting styles.
  • Diverse Training Data: Incorporates a wide array of instruction-augmented datasets such as ShareGPT, WizardLM, Wizard-Vicuna, and various reasoning and code datasets.
  • Roleplay Proficiency: Specifically trained with roleplay data, enhancing its ability to engage in character-based interactions.

Good For

  • Long-form Chat Applications: Ideal for chatbots and conversational AI requiring extended memory.
  • Roleplay Scenarios: Excels in generating creative and consistent responses for roleplaying.
  • General Instruction Following: Capable of handling a variety of instruction-based tasks due to its diverse training.
  • Developers needing fp16: Provided in fp16 pytorch format for GPU inference and further conversions.