TheBloke/Manticore-13B-Chat-Pyg-Guanaco-SuperHOT-8K-fp16

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kLicense:otherArchitecture:Transformer0.0K Cold

TheBloke/Manticore-13B-Chat-Pyg-Guanaco-SuperHOT-8K-fp16 is a 13 billion parameter language model, a merged variant of Monero/Manticore-13b-Chat-Pyg-Guanaco with Kaio Ken's SuperHOT 8K. This fp16 PyTorch model is specifically engineered to support an extended context length of 8192 tokens, leveraging a custom RoPE scaling technique. It is primarily designed for GPU inference in scenarios requiring longer conversational memory or processing extensive documents.

Loading preview...

Overview

This model, Manticore-13B-Chat-Pyg-Guanaco-SuperHOT-8K-fp16, is a 13 billion parameter language model created by TheBloke. It is a merge of the Monero/Manticore-13b-Chat-Pyg-Guanaco base model with Kaio Ken's SuperHOT 8K LoRA. The primary distinguishing feature of this model is its extended context window of 8192 tokens, achieved through a specific RoPE scaling technique.

Key Capabilities

  • Extended Context: Supports an 8K (8192 token) context length, enabling the model to process and generate longer sequences of text while maintaining coherence.
  • FP16 Precision: Provided in fp16 PyTorch format, suitable for GPU inference and further conversions.
  • Merged Architecture: Combines the Manticore-13B-Chat-Pyg-Guanaco base with the SuperHOT 8K LoRA, integrating their respective strengths.

Good For

  • Long-form Content Generation: Ideal for applications requiring the model to maintain context over extensive dialogues or documents.
  • GPU Inference: Optimized for deployment on GPUs due to its fp16 PyTorch format.
  • Further Conversions: Serves as an unquantized base for users who wish to perform their own quantizations or modifications.