TheBloke/Manticore-13B-Chat-Pyg-Guanaco-SuperHOT-8K-fp16
TheBloke/Manticore-13B-Chat-Pyg-Guanaco-SuperHOT-8K-fp16 is a 13 billion parameter language model, a merged variant of Monero/Manticore-13b-Chat-Pyg-Guanaco with Kaio Ken's SuperHOT 8K. This fp16 PyTorch model is specifically engineered to support an extended context length of 8192 tokens, leveraging a custom RoPE scaling technique. It is primarily designed for GPU inference in scenarios requiring longer conversational memory or processing extensive documents.
Loading preview...
Overview
This model, Manticore-13B-Chat-Pyg-Guanaco-SuperHOT-8K-fp16, is a 13 billion parameter language model created by TheBloke. It is a merge of the Monero/Manticore-13b-Chat-Pyg-Guanaco base model with Kaio Ken's SuperHOT 8K LoRA. The primary distinguishing feature of this model is its extended context window of 8192 tokens, achieved through a specific RoPE scaling technique.
Key Capabilities
- Extended Context: Supports an 8K (8192 token) context length, enabling the model to process and generate longer sequences of text while maintaining coherence.
- FP16 Precision: Provided in fp16 PyTorch format, suitable for GPU inference and further conversions.
- Merged Architecture: Combines the Manticore-13B-Chat-Pyg-Guanaco base with the SuperHOT 8K LoRA, integrating their respective strengths.
Good For
- Long-form Content Generation: Ideal for applications requiring the model to maintain context over extensive dialogues or documents.
- GPU Inference: Optimized for deployment on GPUs due to its fp16 PyTorch format.
- Further Conversions: Serves as an unquantized base for users who wish to perform their own quantizations or modifications.