ClueAI/ChatYuan-7B

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jun 2, 2023License:gpl-3.0Architecture:Transformer0.0K Open Weights Cold

ChatYuan-7B is a 7 billion parameter bilingual (Chinese and English) functional dialogue language model developed by ClueAI. Built upon the LLaMA-7B architecture, it underwent a three-stage training process including continued pre-training on 50 billion Chinese tokens, task-oriented instruction fine-tuning on hundreds of task sets, and instruction fine-tuning on millions of human feedback datasets. This model is designed for conversational AI applications, particularly excelling in Chinese language understanding and generation.

Loading preview...

Overview

ClueAI/ChatYuan-7B is a 7 billion parameter bilingual (Chinese and English) functional dialogue language model. It is built on the LLaMA-7B architecture and has undergone a comprehensive three-stage training process to enhance its capabilities, especially in Chinese language processing and conversational tasks.

Key Training Stages

  • Stage 1: Continued pre-training on 50 billion Chinese tokens using general Chinese corpora.
  • Stage 2: Task-oriented instruction fine-tuning across hundreds of diverse task sets.
  • Stage 3: Instruction fine-tuning utilizing millions of human feedback datasets.

Usage and Merging

Due to LLaMA model license compliance, ChatYuan-7B is released as incremental weights. Users need to merge these incremental weights with the original LLaMA-7B weights to obtain the full ChatYuan-7B model. A Python script apply_delta.py is provided for this merging process, combining a LLaMA-7B Hugging Face model with the ChatYuan-7B delta weights.

Capabilities

  • Supports both Chinese and English dialogue generation.
  • Demonstrates functional dialogue capabilities, as shown in examples like generating detailed responses to educational questions, continuing articles based on titles, and drafting marketing plans.

Limitations and Restrictions

  • May generate factually incorrect information when asked to follow fact-related instructions.
  • Can occasionally produce harmful responses due to difficulty in identifying potentially harmful instructions.
  • Requires further improvement in reasoning and coding abilities.

Note: The developers restrict the use of this model and its derivatives to research purposes only, prohibiting commercial use and other potentially harmful scenarios due to existing limitations.