TheBloke/Robin-7B-v2-SuperHOT-8K-fp16

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kLicense:otherArchitecture:Transformer0.0K Cold

TheBloke/Robin-7B-v2-SuperHOT-8K-fp16 is a 7 billion parameter causal language model, created by OptimalScale and further developed by Kaio Ken and TheBloke. It merges OptimalScale's Robin 7B v2 with Kaio Ken's SuperHOT 8K LoRA, extending its context window to 8192 tokens. This model is specifically fine-tuned for NSFW content and is designed for GPU inference, offering an expanded context length for more extensive conversational or generative tasks.

Loading preview...

Model Overview

This model, TheBloke/Robin-7B-v2-SuperHOT-8K-fp16, is a 7 billion parameter language model based on OptimalScale's Robin 7B v2. It has been enhanced by merging with Kaio Ken's SuperHOT 8K LoRA, which significantly extends its context window to 8192 tokens. This allows for processing and generating much longer sequences of text compared to its base model's original 4096 token context.

Key Capabilities

  • Extended Context Window: Achieves an 8K context length, enabling more coherent and detailed long-form generation and understanding.
  • NSFW Focus: The SuperHOT LoRA was specifically trained with a focus on NSFW content.
  • FP16 Precision: Provided in fp16 PyTorch format, suitable for GPU inference and further conversions.
  • Flexible Configuration: The config.json is set to 8192 sequence length by default, but can be adjusted.

Usage Notes

  • Requires trust_remote_code=True for proper context scaling during inference.
  • For exllama or exllama_hf loaders, use --max_seq_len 8192 --compress_pos_emb 4 arguments.
  • The model is a merge of OptimalScale/robin-7b-v2-delta and kaiokendev/superhot-7b-8k-no-rlhf-test.