TheBloke/Manticore-13B-SuperHOT-8K-fp16

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kLicense:otherArchitecture:Transformer0.0K Cold

TheBloke/Manticore-13B-SuperHOT-8K-fp16 is a 13 billion parameter Llama-based model developed by OpenAccess AI Collective, merged with Kaio Ken's SuperHOT 8K extension. This model is specifically designed for extended context understanding, supporting an 8K context length, making it suitable for tasks requiring processing longer inputs. It is fine-tuned on a diverse range of datasets including ShareGPT, WizardLM, and various instruction-augmented datasets for detailed responses and summarization.

Loading preview...

Manticore-13B-SuperHOT-8K-fp16 Overview

This model is a 13 billion parameter Llama-based language model, a merge of OpenAccess AI Collective's Manticore 13B with Kaio Ken's SuperHOT 8K extension. The primary differentiator is its extended context window of 8192 tokens, achieved through the SuperHOT technique. This allows the model to process and generate longer sequences of text, making it highly effective for tasks that require extensive contextual understanding.

Key Capabilities

  • Extended Context: Leverages an 8K context window for deeper understanding of long-form content.
  • Instruction Following: Fine-tuned on diverse instruction datasets like ShareGPT, WizardLM, and GPT4-LLM-Cleaned for robust instruction adherence.
  • Reasoning and Summarization: Includes instruction-augmented datasets for detailed responses, logical reasoning (e.g., MMLU subsets), and concise summarization (e.g., openai/summarize_from_feedback).
  • Code Generation: Examples demonstrate its ability to generate Python code with memoization.

Good For

  • Applications requiring processing and generating long documents or conversations.
  • Tasks benefiting from a broad contextual understanding, such as complex question answering or detailed content creation.
  • Developers looking for a 13B model with enhanced context capabilities for GPU inference, or as a base for further conversions (e.g., GPTQ, GGML).