winglian/vicuna-13b-1_1-hf
The winglian/vicuna-13b-1_1-hf model is a 13 billion parameter, auto-regressive language model based on the transformer architecture, fine-tuned from LLaMA by the Vicuna team (UC Berkeley, CMU, Stanford, UC San Diego). It was trained on 70,000 user-shared conversations from ShareGPT, making it optimized for chatbot research and conversational AI. This version incorporates tokenization and separator refinements (using EOS token "") for improved compatibility and generation stop criteria.
Loading preview...
Model Overview
This model, winglian/vicuna-13b-1_1-hf, is a Hugging Face version of the Vicuna 13B 1.1 model. Vicuna is an open-source chatbot developed by a collaborative team from UC Berkeley, CMU, Stanford, and UC San Diego. It is an auto-regressive language model built upon the transformer architecture, specifically fine-tuned from the original LLaMA 13B model.
Key Characteristics & Updates
- Base Model: Fine-tuned from LLaMA 13B.
- Training Data: Trained on 70,000 user-shared conversations collected from ShareGPT.com, focusing on conversational abilities.
- Version 1.1 Enhancements: This
v1.1release includes significant updates to tokenization and separator handling. The separator was changed from"###"to the EOS token"</s>", which simplifies generation stop criteria and enhances compatibility with various libraries. - Improved Fine-tuning: The supervised fine-tuning loss computation was refined to contribute to better overall model quality.
Intended Use Cases
- Research: Primarily intended for research purposes in large language models and chatbots.
- Hobbyist Projects: Suitable for hobbyists exploring natural language processing, machine learning, and artificial intelligence applications.
Licensing
The model is released under the Apache License 2.0.