zjunlp/zhixi-13b-diff

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:May 23, 2023License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The zjunlp/zhixi-13b-diff model is a 13 billion parameter LLaMA-based large language model developed by ZJUNLP. It is a differential weight model designed to enhance Chinese language understanding and knowledge extraction capabilities while retaining original English and code capacities. This model is optimized for knowledge-centric tasks, including information extraction (NER, RE, IE) through instruction-following, and supports general abilities like translation, coding, and reasoning.

Loading preview...

Overview

zjunlp/zhixi-13b-diff is a 13 billion parameter large language model from the LLaMA family, developed by ZJUNLP as part of the KnowLM project. This model represents the weight difference between LLaMA 13B and ZhiXi-13B, focusing on enhancing knowledge acquisition and comprehension, particularly in Chinese.

Key Capabilities

  • Enhanced Chinese Understanding: Full-scale pre-training with Chinese corpora augments the model's grasp of Chinese without compromising its original English and code capacities.
  • Knowledge Extraction: Optimized for knowledge extraction tasks like Named Entity Recognition (NER), Relation Extraction (RE), and Information Extraction (IE) using human instructions, leveraging a technique called KG2Instructions.
  • Instruction Following: Fine-tuned with a 1.4 million Chinese instruction dataset to bolster understanding of human instructions.
  • Multilingual Support: Demonstrates capabilities in English and Chinese for tasks such as translation, coding, and general reasoning.
  • Differential Weights: Released as differential weights, requiring merging with the base LLaMA-13B model for full functionality.

Good For

  • Knowledge-intensive applications: Especially those requiring precise information extraction from text.
  • Bilingual (Chinese/English) NLP tasks: Where strong performance in both languages is crucial.
  • Research and Development: Provides open-source pre-training and LoRA instruction-tuning code for further experimentation and model development.