Name: TheBloke/Koala-7B-SuperHOT-8K-fp16 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: TheBloke

Model Overview

This model, TheBloke/Koala-7B-SuperHOT-8K-fp16, is a 7 billion parameter language model derived from a merge of the original Koala 7B base model and Kaio Ken's SuperHOT 8K LoRA. It is provided in fp16 PyTorch format, suitable for GPU inference and further conversions. A key feature is its extended context window, supporting up to 8192 tokens, which is enabled through specific modeling code and configuration.

Key Capabilities

Extended Context Window: Achieves an 8K (8192 token) context length, allowing for processing and generation of much longer texts compared to standard models.
Merged Architecture: Combines the Koala 7B base with the SuperHOT 8K LoRA, which was originally developed with a focus on NSFW content and extended context.
Flexible Configuration: The config.json is set to 8192 sequence length by default, but can be adjusted to 4096 if a smaller sequence length is desired.
Inference Support: Designed for GPU inference, with examples provided for Python using the transformers library and trust_remote_code=True.

Good For

Applications requiring a large context window for understanding and generating long-form content.
Developers looking for an fp16 PyTorch model as a base for further fine-tuning or conversions.
Research into extended context capabilities in 7B parameter models.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)