Name: TheBloke/Baize-v2-13B-SuperHOT-8K-fp16 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: TheBloke

Model Overview

This model, Baize-v2-13B-SuperHOT-8K-fp16, is a 13 billion parameter language model created by merging Project Baize's Baize 13B v2 with Kaio Ken's SuperHOT 8K. It is provided in fp16 PyTorch format, making it suitable for direct GPU inference and as a base for further quantization or conversions.

Key Capabilities

Extended Context Window: Achieves an 8K (8192 token) context length during inference, enabled by the SuperHOT 8K integration and specific configuration settings (trust_remote_code=True or a monkey patch).
Base Model: Built upon Project Baize's Baize 13B v2, which is a chat model fine-tuned with LoRA using supervised fine-tuning (SFT) and self-distillation with feedback (SDF).
SuperHOT Integration: Incorporates Kaio Ken's SuperHOT 13b LoRA, which is noted as a NSFW-focused LoRA, trained with a scaling factor of 4 for 8K context.
Flexible Configuration: The config.json is set to 8192 sequence length by default, but can be adjusted to 4096 if a smaller sequence length is desired.

Good For

Applications requiring a large context window for detailed conversations or document processing.
Use cases that benefit from the specific NSFW-focused fine-tuning of the SuperHOT LoRA.
Developers looking for an unquantized fp16 model for custom deployments, further fine-tuning, or conversion to other formats (e.g., GPTQ, GGML).

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)