openkg/knowlm-13b-diff

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kLicense:apache-2.0Architecture:Transformer0.0K Open Weights Cold

openkg/knowlm-13b-diff is a 13 billion parameter differential weight model from the KnowLM project by ZJU NLP, designed to enhance Chinese understanding and knowledge extraction capabilities. It is derived from LLaMA-13B through full-scale pre-training on Chinese, English, and code corpora, and further fine-tuned for instruction-based knowledge extraction tasks. This model specializes in improving knowledge acquisition and comprehension, addressing challenges like knowledge updates and discrepancies in large language models.

Loading preview...

KnowLM-13B-Diff: Enhancing Knowledge and Chinese Understanding

KnowLM-13B-Diff is a 13 billion parameter differential weight model from the KnowLM project by ZJU NLP, built upon LLaMA-13B. This model addresses challenges in knowledge acquisition and comprehension in large language models, such as knowledge updating difficulties and potential discrepancies. The project's initial phase introduced ZhiXi (智析), a knowledge extraction LLM based on LLaMA.

Key Capabilities & Training

  • Full-scale Pre-training: The model undergoes full-scale pre-training using extensive Chinese corpora (Baidu Baike, Wudao, Chinese Wikipedia), augmented English corpora (including recent Wikipedia data), and high-quality code corpora (GitHub, LeetCode). This process significantly enhances Chinese understanding while retaining original English and code capacities.
  • Instruction Fine-tuning: It is fine-tuned with a large instruction dataset (approximately 1400K samples) using LoRA, bolstering its ability to understand human instructions for knowledge extraction tasks.
  • Knowledge Extraction Focus: Optimized for tasks like Named Entity Recognition (NER), Relation Extraction (RE), and Information Extraction (IE) through a KG2Instructions approach, allowing these to be completed via human instructions.
  • Differential Weights: This release provides differential weights, requiring merging with the original LLaMA-13B weights to restore the full ZhiXi-13B model.

Use Cases & Limitations

  • Good for: Applications requiring robust Chinese language understanding, knowledge extraction, and instruction-based information processing. It is particularly suited for tasks involving structured data extraction from text.
  • Limitations: The current instruction tuning uses LoRA, not full tuning. It does not yet support multi-turn conversations, and while efforts are made to ensure harmlessness, toxic outputs may still occur. The pre-training is extensive but not fully exhaustive due to resource constraints.