Name: TheBloke/Guanaco-7B-SuperHOT-8K-fp16 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: TheBloke

Model Overview

This model, TheBloke/Guanaco-7B-SuperHOT-8K-fp16, is a 7 billion parameter language model created by merging Tim Dettmers' Guanaco 7B with Kaio Ken's SuperHOT 8K LoRA. The primary differentiator of this model is its significantly extended context window, supporting up to 8192 tokens, achieved through the SuperHOT 8K integration.

Key Capabilities

Extended Context Length: Supports an 8K (8192 token) context window, enabling processing of longer inputs and maintaining extended conversational memory.
FP16 Precision: Provided in fp16 PyTorch format, suitable for GPU inference and as a base for further quantization or fine-tuning.
LoRA Merged: Incorporates the SuperHOT 8K LoRA, which was trained with a focus on NSFW content and context extension techniques.

Usage Considerations

Inference: Requires trust_remote_code=True in Hugging Face Transformers to properly utilize the 8K context scaling.
Compatibility: Quantized versions (GPTQ, GGML) are available for different hardware setups.
Training Details (SuperHOT LoRA): The SuperHOT LoRA was trained on 1200 samples over 3 epochs with a learning rate of 3e-4, targeting q_proj, k_proj, v_proj, and o_proj modules with a rank of 4 and alpha of 8.

Overview

Model Overview

Key Capabilities

Usage Considerations

Full Model Card (README)