KnowLM-13B-Diff: Enhancing Knowledge and Chinese Understanding

KnowLM-13B-Diff is a 13 billion parameter differential weight model from the KnowLM project by ZJU NLP, built upon LLaMA-13B. This model addresses challenges in knowledge acquisition and comprehension in large language models, such as knowledge updating difficulties and potential discrepancies. The project's initial phase introduced ZhiXi (智析), a knowledge extraction LLM based on LLaMA.

Key Capabilities & Training

Full-scale Pre-training: The model undergoes full-scale pre-training using extensive Chinese corpora (Baidu Baike, Wudao, Chinese Wikipedia), augmented English corpora (including recent Wikipedia data), and high-quality code corpora (GitHub, LeetCode). This process significantly enhances Chinese understanding while retaining original English and code capacities.
Instruction Fine-tuning: It is fine-tuned with a large instruction dataset (approximately 1400K samples) using LoRA, bolstering its ability to understand human instructions for knowledge extraction tasks.
Knowledge Extraction Focus: Optimized for tasks like Named Entity Recognition (NER), Relation Extraction (RE), and Information Extraction (IE) through a KG2Instructions approach, allowing these to be completed via human instructions.
Differential Weights: This release provides differential weights, requiring merging with the original LLaMA-13B weights to restore the full ZhiXi-13B model.

Use Cases & Limitations

Good for: Applications requiring robust Chinese language understanding, knowledge extraction, and instruction-based information processing. It is particularly suited for tasks involving structured data extraction from text.
Limitations: The current instruction tuning uses LoRA, not full tuning. It does not yet support multi-turn conversations, and while efforts are made to ensure harmlessness, toxic outputs may still occur. The pre-training is extensive but not fully exhaustive due to resource constraints.

Overview

KnowLM-13B-Diff: Enhancing Knowledge and Chinese Understanding

Key Capabilities & Training

Use Cases & Limitations

Full Model Card (README)