Name: TheBloke/Vicuna-7B-CoT-SuperHOT-8K-fp16 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: TheBloke

Model Overview

This model, TheBloke/Vicuna-7B-CoT-SuperHOT-8K-fp16, is a 7 billion parameter language model resulting from the merge of Kevin Pro's Vicuna 7B CoT and Kaio Ken's SuperHOT 8K. It is provided in fp16 PyTorch format, suitable for GPU inference.

Key Capabilities

Extended Context Window: Achieves an 8K (8192 token) context length, significantly longer than typical 4K models, by integrating Kaio Ken's SuperHOT 8K LoRA and utilizing trust_remote_code=True during inference.
Chain-of-Thought (CoT) Enhancement: Incorporates Kevin Pro's Vicuna 7B CoT, which is specifically fine-tuned to improve Chain-of-Thought reasoning capabilities.
Flexible Configuration: The config.json is set to 8192 sequence length by default, but can be adjusted to 4096 if a smaller sequence length is desired.
Scalability: The provided modeling code automatically sets the scale parameter based on max_position_embeddings, e.g., scale=4 for 8192 tokens.

Good For

Long-form text generation: Ideal for applications requiring extensive context understanding and generation, such as detailed conversations, document analysis, or creative writing with complex narratives.
Reasoning tasks: Benefits from the Chain-of-Thought fine-tuning, making it suitable for tasks that require multi-step reasoning.
GPU-based inference: Optimized for performance on GPUs due to its fp16 PyTorch format.

Usage Notes

To leverage the 8K context, users must ensure trust_remote_code=True is enabled during model loading. For exllama or exllama_hf loaders, arguments like --max_seq_len 8192 --compress_pos_emb 4 are recommended.

Overview

Model Overview

Key Capabilities

Good For

Usage Notes

Full Model Card (README)