TheBloke/Koala-13B-SuperHOT-8K-fp16

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kLicense:otherArchitecture:Transformer0.0K Cold

TheBloke/Koala-13B-SuperHOT-8K-fp16 is a 13 billion parameter causal language model, created by TheBloke, that merges the Koala 13B model with Kaio Ken's SuperHOT 8K LoRA. This fp16 PyTorch model is specifically designed to achieve an extended context length of 8192 tokens during inference. It is optimized for dialogue tasks and excels in scenarios requiring longer conversational memory.

Loading preview...

Model Overview

This model, Koala-13B-SuperHOT-8K-fp16, is a 13 billion parameter language model developed by TheBloke. It is a merge of the original Koala 13B model, a dialogue model from Berkeley, and Kaio Ken's SuperHOT 8K LoRA. The primary enhancement is its capability to handle an extended context length of 8192 tokens during inference, achieved through the integration of the SuperHOT 8K LoRA and specific configuration settings.

Key Capabilities

  • Extended Context Window: Supports an 8K (8192 token) context length, significantly improving its ability to maintain long conversations or process extensive documents.
  • Dialogue Optimization: Inherits the dialogue capabilities of the base Koala 13B model, making it suitable for conversational AI applications.
  • FP16 Precision: Provided in fp16 PyTorch format, suitable for GPU inference and further conversions.

Good For

  • Applications requiring long-form conversational memory.
  • Tasks that benefit from processing larger input texts or dialogue histories.
  • Developers looking for a 13B model with enhanced context handling for dialogue-centric use cases.