CarperAI/stable-vicuna-13b-delta
CarperAI's StableVicuna-13B is a 13 billion parameter Vicuna-13B v0 model fine-tuned using Reinforcement Learning from Human Feedback (RLHF) via Proximal Policy Optimization (PPO). This LLaMA-based auto-regressive language model is optimized for conversational and instructional tasks. It leverages a mix of human-generated and AI-generated datasets, including OpenAssistant Conversations, GPT4All Prompt Generations, and Alpaca, to enhance its chat capabilities. StableVicuna-13B is intended for text generation, particularly in conversational contexts, and can be further fine-tuned for specific use cases.
Loading preview...
StableVicuna-13B Overview
StableVicuna-13B, developed by CarperAI, is a 13 billion parameter language model built upon the Vicuna-13B v0 architecture, which itself is based on the LLaMA transformer. Its key differentiator is the application of Reinforcement Learning from Human Feedback (RLHF) using Proximal Policy Optimization (PPO) to fine-tune the model on diverse conversational and instructional datasets.
Key Capabilities
- RLHF-Enhanced Conversations: Fine-tuned with PPO on datasets like OpenAssistant Conversations (OASST1), GPT4All Prompt Generations, and Alpaca, making it proficient in generating human-like conversational and instructional responses.
- LLaMA Architecture: Benefits from the robust LLaMA transformer architecture, providing a strong foundation for language understanding and generation.
- Customizable: Designed to be further fine-tuned by users on their specific data to improve performance for particular tasks.
Training Details
The model was trained by Duy Phung of CarperAI using the trlX library. The training involved a mix of three primary datasets for fine-tuning: OASST1, GPT4All Prompt Generations, and Alpaca. The reward model for RLHF was also trained on OASST1, Anthropic HH-RLHF, and Stanford Human Preferences Dataset, ensuring alignment with human preferences for helpfulness and harmlessness.
Intended Use
StableVicuna-13B is primarily intended for text generation, with a strong focus on conversational applications. Users can leverage its fine-tuned capabilities for various chat-based tasks or adapt it through further fine-tuning for specialized use cases, adhering to its non-commercial license.