Name: TheBloke/Robin-7B-v2-SuperHOT-8K-fp16 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: TheBloke

Model Overview

This model, TheBloke/Robin-7B-v2-SuperHOT-8K-fp16, is a 7 billion parameter language model based on OptimalScale's Robin 7B v2. It has been enhanced by merging with Kaio Ken's SuperHOT 8K LoRA, which significantly extends its context window to 8192 tokens. This allows for processing and generating much longer sequences of text compared to its base model's original 4096 token context.

Key Capabilities

Extended Context Window: Achieves an 8K context length, enabling more coherent and detailed long-form generation and understanding.
NSFW Focus: The SuperHOT LoRA was specifically trained with a focus on NSFW content.
FP16 Precision: Provided in fp16 PyTorch format, suitable for GPU inference and further conversions.
Flexible Configuration: The config.json is set to 8192 sequence length by default, but can be adjusted.

Usage Notes

Requires trust_remote_code=True for proper context scaling during inference.
For exllama or exllama_hf loaders, use --max_seq_len 8192 --compress_pos_emb 4 arguments.
The model is a merge of OptimalScale/robin-7b-v2-delta and kaiokendev/superhot-7b-8k-no-rlhf-test.

Overview

Model Overview

Key Capabilities

Usage Notes

Full Model Card (README)