TheBloke/guanaco-13B-SuperHOT-8K-fp16 is a 13 billion parameter language model, a merge of Tim Dettmers' Guanaco 13B and Kaio Ken's SuperHOT 8K LoRA. This fp16 PyTorch model is designed for GPU inference and features an extended context window of 8192 tokens. It leverages the SuperHOT technique to enhance context handling, making it suitable for applications requiring longer conversational memory or document processing.
Loading preview...
Overview
This model, guanaco-13B-SuperHOT-8K-fp16, is a 13 billion parameter language model created by TheBloke. It is a merge of two distinct components: Tim Dettmers' Guanaco 13B base model and Kaio Ken's SuperHOT 8K LoRA. The integration of the SuperHOT 8K LoRA is specifically designed to extend the model's effective context window to 8192 tokens, a significant increase over standard models.
Key Capabilities
- Extended Context Window: Achieves an 8K (8192 token) context length, enabling the model to process and generate longer sequences of text.
- FP16 Precision: Provided in fp16 PyTorch format, optimized for GPU inference and further conversions.
- Merged Architecture: Combines the strengths of the Guanaco 13B base with the context-extending capabilities of the SuperHOT 8K LoRA.
Training Details (SuperHOT LoRA)
The SuperHOT LoRA was trained with specific configurations to achieve its extended context capabilities:
- 1200 samples, with approximately 400 samples over a 2048 sequence length.
- Learning rate of 3e-4 over 3 epochs.
- LoRA modules included
q_proj,k_proj,v_proj,o_projwith a rank of 4 and alpha of 8. - Trained on a 4-bit base model.
Good For
- Applications requiring processing or generating long texts, such as detailed summaries, extended conversations, or document analysis.
- Developers looking for a 13B parameter model with enhanced context handling for GPU-based inference.