TheBloke/Vicuna-13B-1-3-SuperHOT-8K-fp16
TheBloke/Vicuna-13B-1-3-SuperHOT-8K-fp16 is a 13 billion parameter auto-regressive language model, developed by TheBloke, based on the LLaMA architecture. This model is a merge of LmSys' Vicuna 13B 1.3.0 and Kaio Ken's SuperHOT 8K, specifically fine-tuned to leverage an extended context length of 8192 tokens. It is designed for research and hobbyist use in natural language processing, machine learning, and artificial intelligence, particularly for chat assistant applications requiring longer context understanding.
Loading preview...
Overview
This model, TheBloke/Vicuna-13B-1-3-SuperHOT-8K-fp16, is a 13 billion parameter auto-regressive language model. It is a merge of LmSys' Vicuna 13B 1.3.0 and Kaio Ken's SuperHOT 8K, specifically configured for an extended context length of 8192 tokens. The base Vicuna model is a chat assistant fine-tuned from LLaMA on user-shared conversations from ShareGPT.
Key Capabilities
- Extended Context Window: Leverages an 8192-token context length, significantly larger than the base Vicuna's 4096 tokens, enabling the model to process and generate longer, more coherent responses.
- Chat Assistant: Inherits the chat assistant capabilities of the Vicuna 1.3 model, trained on approximately 140K ShareGPT conversations.
- Research and Development: Primarily intended for researchers and hobbyists in NLP, machine learning, and AI for exploring large language models and chatbots.
Training Details
The SuperHOT 8K component was trained by Kaio Ken using a LoRA configuration with 1200 samples, a learning rate of 3e-4 over 3 epochs, targeting q_proj, k_proj, v_proj, and o_proj modules. The Vicuna 1.3 model was fine-tuned from LLaMA using supervised instruction fine-tuning on ShareGPT conversations.