TheBloke/Selfee-7B-SuperHOT-8K-fp16
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kLicense:otherArchitecture:Transformer0.0K Cold
TheBloke/Selfee-7B-SuperHOT-8K-fp16 is a 7 billion parameter language model, created by merging Kaist AI's Selfee 7B with Kaio Ken's SuperHOT 8K LoRA. This model is designed for extended context applications, supporting an 8K context length during inference. It is provided in fp16 PyTorch format, suitable for GPU inference and further conversions, and is particularly noted for its NSFW-focused fine-tuning.
Loading preview...
Model Overview
This model, TheBloke/Selfee-7B-SuperHOT-8K-fp16, is a 7 billion parameter language model resulting from the merge of Kaist AI's Selfee 7B base model with Kaio Ken's SuperHOT 8K LoRA. It is distributed in fp16 PyTorch format, optimized for GPU inference.
Key Capabilities
- Extended Context Window: Achieves an 8K (8192 tokens) context length during inference, enabled by the SuperHOT 8K merge and
trust_remote_code=Truefunctionality. - NSFW Focus: The SuperHOT LoRA component was specifically trained with a NSFW focus, making this model potentially suitable for applications requiring such content generation.
- Flexible Configuration: The
config.jsonis set to 8192 sequence length by default, but can be adjusted to 4096 if a smaller sequence length is desired. - Conversion Ready: The fp16 PyTorch format serves as a base for further conversions into other formats like GPTQ (4-bit) or GGML (2-8 bit) for various hardware setups.
Good for
- Applications requiring a 7B parameter model with an extended 8K context window.
- Use cases that benefit from a model with a NSFW-focused fine-tuning.
- Developers looking for a PyTorch fp16 model for GPU inference or as a starting point for custom quantizations and conversions.