TheBloke/Guanaco-7B-SuperHOT-8K-fp16

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kLicense:otherArchitecture:Transformer0.0K Cold

TheBloke/Guanaco-7B-SuperHOT-8K-fp16 is a 7 billion parameter causal language model, a merge of Tim Dettmers' Guanaco 7B and Kaio Ken's SuperHOT 8K LoRA. This model is specifically designed for extended context, supporting an 8K token context length, making it suitable for applications requiring longer conversational memory or document processing. It is provided in fp16 PyTorch format for GPU inference and further conversions.

Loading preview...

Model Overview

This model, TheBloke/Guanaco-7B-SuperHOT-8K-fp16, is a 7 billion parameter language model created by merging Tim Dettmers' Guanaco 7B with Kaio Ken's SuperHOT 8K LoRA. The primary differentiator of this model is its significantly extended context window, supporting up to 8192 tokens, achieved through the SuperHOT 8K integration.

Key Capabilities

  • Extended Context Length: Supports an 8K (8192 token) context window, enabling processing of longer inputs and maintaining extended conversational memory.
  • FP16 Precision: Provided in fp16 PyTorch format, suitable for GPU inference and as a base for further quantization or fine-tuning.
  • LoRA Merged: Incorporates the SuperHOT 8K LoRA, which was trained with a focus on NSFW content and context extension techniques.

Usage Considerations

  • Inference: Requires trust_remote_code=True in Hugging Face Transformers to properly utilize the 8K context scaling.
  • Compatibility: Quantized versions (GPTQ, GGML) are available for different hardware setups.
  • Training Details (SuperHOT LoRA): The SuperHOT LoRA was trained on 1200 samples over 3 epochs with a learning rate of 3e-4, targeting q_proj, k_proj, v_proj, and o_proj modules with a rank of 4 and alpha of 8.