Name: TheBloke/Vicuna-13B-1-3-SuperHOT-8K-fp16 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: TheBloke

Overview

This model, TheBloke/Vicuna-13B-1-3-SuperHOT-8K-fp16, is a 13 billion parameter auto-regressive language model. It is a merge of LmSys' Vicuna 13B 1.3.0 and Kaio Ken's SuperHOT 8K, specifically configured for an extended context length of 8192 tokens. The base Vicuna model is a chat assistant fine-tuned from LLaMA on user-shared conversations from ShareGPT.

Key Capabilities

Extended Context Window: Leverages an 8192-token context length, significantly larger than the base Vicuna's 4096 tokens, enabling the model to process and generate longer, more coherent responses.
Chat Assistant: Inherits the chat assistant capabilities of the Vicuna 1.3 model, trained on approximately 140K ShareGPT conversations.
Research and Development: Primarily intended for researchers and hobbyists in NLP, machine learning, and AI for exploring large language models and chatbots.

Training Details

The SuperHOT 8K component was trained by Kaio Ken using a LoRA configuration with 1200 samples, a learning rate of 3e-4 over 3 epochs, targeting q_proj, k_proj, v_proj, and o_proj modules. The Vicuna 1.3 model was fine-tuned from LLaMA using supervised instruction fine-tuning on ShareGPT conversations.

Overview

Overview

Key Capabilities

Training Details

Full Model Card (README)