TheBloke/wizard-vicuna-13B-SuperHOT-8K-fp16
TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kLicense:otherArchitecture:Transformer0.0K Cold

TheBloke/wizard-vicuna-13B-SuperHOT-8K-fp16 is a 13 billion parameter language model, a float16 PyTorch version of June Lee's Wizard Vicuna 13B merged with Kaio Ken's SuperHOT 8K LoRA. This model is specifically designed to leverage an extended context window of 8192 tokens, significantly enhancing its ability to process and generate longer sequences of text. It combines the conversational strengths of Wizard Vicuna with SuperHOT's context extension, making it suitable for applications requiring deep contextual understanding over extended dialogues or documents.

Loading preview...

Model Overview

This model, TheBloke/wizard-vicuna-13B-SuperHOT-8K-fp16, is a 13 billion parameter language model provided in float16 PyTorch format. It is a merge of June Lee's Wizard Vicuna 13B and Kaio Ken's SuperHOT 8K LoRA. The primary differentiator of this model is its significantly extended context window, supporting up to 8192 tokens during inference, achieved through the SuperHOT 8K integration.

Key Capabilities

  • Extended Context Handling: Leverages an 8K context window, enabling the model to maintain coherence and understanding over much longer inputs and outputs compared to standard models.
  • Enhanced Conversational Ability: Built upon Wizard Vicuna, which combines the in-depth dataset handling of WizardLM with Vicuna's multi-round conversation tuning methods, leading to improved dialogue capabilities.
  • Performance Improvement: The original Wizard Vicuna 13B showed approximately a 7% performance improvement over Vicuna-13B in non-rigorous, GPT-4 scored benchmarks.

Good For

  • Applications requiring processing or generating long documents, articles, or extended dialogues.
  • Use cases where maintaining context over many turns in a conversation is crucial.
  • Developers looking for a 13B model with a large context window for GPU inference, with options for further quantization (GPTQ, GGML) available from TheBloke.