Name: TheBloke/Tulu-7B-SuperHOT-8K-fp16 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: TheBloke

Overview

This model, TheBloke/Tulu-7B-SuperHOT-8K-fp16, is a 7 billion parameter LLaMA-based language model. It is a merge of Allen AI's Tulu 7B and Kaio Ken's SuperHOT 8K LoRA, provided in fp16 PyTorch format. The primary differentiator of this model is its significantly extended context window of 8192 tokens, achieved by integrating the SuperHOT 8K LoRA and utilizing specific scaling techniques during inference, activated via trust_remote_code=True.

Key Capabilities

Extended Context Length: Supports an 8K (8192 token) context window, allowing for processing and generating longer sequences of text compared to standard 4K models.
Instruction Following: Built upon Allen AI's Tulu 7B, which was fine-tuned on a diverse mixture of instruction datasets including FLAN V2, CoT, Dolly, Open Assistant 1, GPT4-Alpaca, Code-Alpaca, and ShareGPT.
GPU Inference: Provided in fp16 PyTorch format, optimized for GPU inference and suitable for further conversions to other formats like GPTQ or GGML.

Good For

Applications requiring processing or generating long-form content, such as detailed summaries, extended conversations, or complex document analysis.
Developers looking for a base fp16 model with an extended context window for further fine-tuning or quantization.
Use cases benefiting from robust instruction-following capabilities derived from its Tulu 7B foundation.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)