pandaExplosion/opendata-chinese-llama2-chat
The pandaExplosion/opendata-chinese-llama2-chat is a 13 billion parameter chat model developed by pandaExplosion, based on Meta's Llama-2 architecture. It is fine-tuned using a combination of supervised fine-tuning (SFT), reward modeling, and Proximal Policy Optimization (PPO) on entirely open-source datasets, including Chinese-translated versions of Alpaca-CoT, Anthropic/hh-rlhf, and OpenAssistant/oasst1. This model is specifically designed for conversational AI in Chinese, demonstrating competitive performance on the C-Eval benchmark.
Loading preview...
Model Overview
The opendata-chinese-llama2-chat is a 13 billion parameter conversational model developed by pandaExplosion, built upon Meta's Llama-2 base. It is part of a three-model suite, including an SFT model and a reward model, all trained on fully open-source datasets. The model is specifically optimized for Chinese language interactions.
Key Capabilities
- Chinese Conversational AI: Fine-tuned for chat applications in Chinese, leveraging translated open-source datasets.
- Llama-2 Architecture: Benefits from the robust Llama-2 foundation, enhanced with specific training stages.
- Reinforcement Learning from Human Feedback (RLHF): Incorporates a reward model and PPO training for improved conversational quality and alignment.
- Open-source Training Data: Utilizes a diverse set of open-source datasets, including over 5 million instructions for SFT and 160k ranking pairs for reward modeling.
Training Details
The model underwent a multi-stage training process using DeepSpeed-Chat:
- Supervised Fine-Tuning (SFT): Trained for 2 epochs on over 5 million instructions from datasets like QingyiSi/Alpaca-CoT, with a sequence length of 4096 tokens.
- Reward Model Training: Trained for 2 epochs on 160k ranking pairs from Anthropic/hh-rlhf and OpenAssistant/oasst1 (and their translated versions), using a sequence length of 2048 tokens.
- PPO Stage: Trained for one epoch on 50k prompts, with a sequence length of 2048 tokens.
Performance
On the C-Eval dataset (5-shot), the opendata-chinese-llama2-chat-13B model achieved a Test Average score of 40.5, outperforming both LLaMA-2-13B (36.6) and LLaMA-2-chat-13B (37.2) in the reported evaluations.
Good For
- Chinese-speaking Chatbots: Ideal for building conversational agents that interact in Chinese.
- Research on RLHF for Chinese LLMs: Provides a strong baseline for further research and development in this area.
- Applications requiring Llama-2 compatibility: Seamlessly integrates with existing Llama-2 ecosystems.