TheBloke/Vicuna-7B-v1-3-SuperHOT-8K-fp16

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kLicense:otherArchitecture:Transformer0.0K Cold

TheBloke/Vicuna-7B-v1-3-SuperHOT-8K-fp16 is a 7 billion parameter language model, developed by TheBloke, that merges LmSys' Vicuna 7B v1.3 with Kaio Ken's SuperHOT 8K LoRA. This model is specifically designed to achieve an extended context length of 8192 tokens, making it suitable for applications requiring longer conversational memory or processing extensive documents. Its primary use case is research and development of chatbots and large language models, particularly for scenarios benefiting from increased context window capabilities.

Loading preview...

Vicuna 7B v1.3 SuperHOT 8K fp16

This model, created by TheBloke, is a 7 billion parameter variant that combines LmSys' Vicuna 7B v1.3 with Kaio Ken's SuperHOT 8K LoRA. The primary innovation is its significantly extended context window, enabling processing of up to 8192 tokens. This is achieved by merging the SuperHOT 7B LoRA onto the base Vicuna model and utilizing specific inference configurations like trust_remote_code=True to activate the 8K context.

Key Capabilities

  • Extended Context Length: Supports an 8192-token context window, ideal for long-form conversations or document analysis.
  • Vicuna Base: Benefits from the conversational fine-tuning of the original Vicuna v1.3 model, which was trained on user-shared conversations from ShareGPT.
  • SuperHOT Integration: Incorporates the SuperHOT LoRA, which was trained with a focus on NSFW content and extended context techniques.

Good for

  • Research and Development: Primarily intended for researchers and hobbyists exploring large language models and chatbots.
  • Applications Requiring Long Context: Suitable for use cases where maintaining extensive conversational history or processing large text inputs is crucial.
  • Further Conversions: The fp16 pytorch format makes it suitable for further quantizations or conversions to other formats (e.g., GPTQ, GGML).