pandaExplosion/opendata-chinese-llama2-chat

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kLicense:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The pandaExplosion/opendata-chinese-llama2-chat is a 13 billion parameter chat model developed by pandaExplosion, based on Meta's Llama-2 architecture. It is fine-tuned using a combination of supervised fine-tuning (SFT), reward modeling, and Proximal Policy Optimization (PPO) on entirely open-source datasets, including Chinese-translated versions of Alpaca-CoT, Anthropic/hh-rlhf, and OpenAssistant/oasst1. This model is specifically designed for conversational AI in Chinese, demonstrating competitive performance on the C-Eval benchmark.

Loading preview...

Model Overview

The opendata-chinese-llama2-chat is a 13 billion parameter conversational model developed by pandaExplosion, built upon Meta's Llama-2 base. It is part of a three-model suite, including an SFT model and a reward model, all trained on fully open-source datasets. The model is specifically optimized for Chinese language interactions.

Key Capabilities

  • Chinese Conversational AI: Fine-tuned for chat applications in Chinese, leveraging translated open-source datasets.
  • Llama-2 Architecture: Benefits from the robust Llama-2 foundation, enhanced with specific training stages.
  • Reinforcement Learning from Human Feedback (RLHF): Incorporates a reward model and PPO training for improved conversational quality and alignment.
  • Open-source Training Data: Utilizes a diverse set of open-source datasets, including over 5 million instructions for SFT and 160k ranking pairs for reward modeling.

Training Details

The model underwent a multi-stage training process using DeepSpeed-Chat:

  • Supervised Fine-Tuning (SFT): Trained for 2 epochs on over 5 million instructions from datasets like QingyiSi/Alpaca-CoT, with a sequence length of 4096 tokens.
  • Reward Model Training: Trained for 2 epochs on 160k ranking pairs from Anthropic/hh-rlhf and OpenAssistant/oasst1 (and their translated versions), using a sequence length of 2048 tokens.
  • PPO Stage: Trained for one epoch on 50k prompts, with a sequence length of 2048 tokens.

Performance

On the C-Eval dataset (5-shot), the opendata-chinese-llama2-chat-13B model achieved a Test Average score of 40.5, outperforming both LLaMA-2-13B (36.6) and LLaMA-2-chat-13B (37.2) in the reported evaluations.

Good For

  • Chinese-speaking Chatbots: Ideal for building conversational agents that interact in Chinese.
  • Research on RLHF for Chinese LLMs: Provides a strong baseline for further research and development in this area.
  • Applications requiring Llama-2 compatibility: Seamlessly integrates with existing Llama-2 ecosystems.