Name: TheBloke/Baize-v2-7B-SuperHOT-8K-fp16 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: TheBloke

Model Overview

This model, TheBloke/Baize-v2-7B-SuperHOT-8K-fp16, is a 7 billion parameter LLaMA-based language model. It is a merge of two distinct projects:

Project Baize's Baize 7B v2: An open-source chat model fine-tuned with LoRA, utilizing supervised fine-tuning (SFT) and self-distillation with feedback (SDF). Baize models are designed for detailed conversational AI, requiring a specific prompt format ([|Human|] and [|AI|]).
Kaio Ken's SuperHOT 8K: A prototype LoRA focused on extending context, specifically to 8K tokens, using a technique described in a GitHub blog. This version is noted for having no RLHF.

Key Capabilities & Features

Extended Context Window: Supports an 8K context length, enabled by the SuperHOT 8K merge and custom scaling during inference.
Conversational AI: Inherits the conversational fine-tuning from Project Baize, making it suitable for detailed chat interactions.
LLaMA Base: Built upon the LLaMA architecture, providing a robust foundation.
FP16 Format: Provided in fp16 pytorch format, ideal for GPU inference and as a base for further quantization or conversions.

Prompt Format: When using the Baize component, adhere to the [|Human|] and [|AI|] prompt format for optimal performance.
Context Scaling: Requires trust_remote_code=True or a monkey patch to properly utilize the 8K context length, with config.max_position_embeddings set to 8192.
No RLHF: The SuperHOT component was trained without RLHF, which may influence its response characteristics.