TheBloke/Baize-v2-13B-SuperHOT-8K-fp16
TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kLicense:otherArchitecture:Transformer0.0K Cold
TheBloke/Baize-v2-13B-SuperHOT-8K-fp16 is a 13 billion parameter language model, a merge of Project Baize's Baize 13B v2 and Kaio Ken's SuperHOT 8K. This model is designed for extended context, supporting up to 8192 tokens during inference. It is an unquantized fp16 PyTorch model, suitable for GPU inference and further conversions, and is particularly noted for its NSFW-focused LoRA integration.
Loading preview...
Model Overview
This model, Baize-v2-13B-SuperHOT-8K-fp16, is a 13 billion parameter language model created by merging Project Baize's Baize 13B v2 with Kaio Ken's SuperHOT 8K. It is provided in fp16 PyTorch format, making it suitable for direct GPU inference and as a base for further quantization or conversions.
Key Capabilities
- Extended Context Window: Achieves an 8K (8192 token) context length during inference, enabled by the SuperHOT 8K integration and specific configuration settings (
trust_remote_code=Trueor a monkey patch). - Base Model: Built upon Project Baize's Baize 13B v2, which is a chat model fine-tuned with LoRA using supervised fine-tuning (SFT) and self-distillation with feedback (SDF).
- SuperHOT Integration: Incorporates Kaio Ken's SuperHOT 13b LoRA, which is noted as a NSFW-focused LoRA, trained with a scaling factor of 4 for 8K context.
- Flexible Configuration: The
config.jsonis set to 8192 sequence length by default, but can be adjusted to 4096 if a smaller sequence length is desired.
Good For
- Applications requiring a large context window for detailed conversations or document processing.
- Use cases that benefit from the specific NSFW-focused fine-tuning of the SuperHOT LoRA.
- Developers looking for an unquantized fp16 model for custom deployments, further fine-tuning, or conversion to other formats (e.g., GPTQ, GGML).