KnowLM-13B-Diff: Enhancing Knowledge and Chinese Understanding
KnowLM-13B-Diff is a 13 billion parameter differential weight model from the KnowLM project by ZJU NLP, built upon LLaMA-13B. This model addresses challenges in knowledge acquisition and comprehension in large language models, such as knowledge updating difficulties and potential discrepancies. The project's initial phase introduced ZhiXi (智析), a knowledge extraction LLM based on LLaMA.
Key Capabilities & Training
- Full-scale Pre-training: The model undergoes full-scale pre-training using extensive Chinese corpora (Baidu Baike, Wudao, Chinese Wikipedia), augmented English corpora (including recent Wikipedia data), and high-quality code corpora (GitHub, LeetCode). This process significantly enhances Chinese understanding while retaining original English and code capacities.
- Instruction Fine-tuning: It is fine-tuned with a large instruction dataset (approximately 1400K samples) using LoRA, bolstering its ability to understand human instructions for knowledge extraction tasks.
- Knowledge Extraction Focus: Optimized for tasks like Named Entity Recognition (NER), Relation Extraction (RE), and Information Extraction (IE) through a KG2Instructions approach, allowing these to be completed via human instructions.
- Differential Weights: This release provides differential weights, requiring merging with the original LLaMA-13B weights to restore the full ZhiXi-13B model.
Use Cases & Limitations
- Good for: Applications requiring robust Chinese language understanding, knowledge extraction, and instruction-based information processing. It is particularly suited for tasks involving structured data extraction from text.
- Limitations: The current instruction tuning uses LoRA, not full tuning. It does not yet support multi-turn conversations, and while efforts are made to ensure harmlessness, toxic outputs may still occur. The pre-training is extensive but not fully exhaustive due to resource constraints.