hkust-nlp/deita-complexity-scorer

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Oct 25, 2023License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The hkust-nlp/deita-complexity-scorer is a 7 billion parameter model developed by HKUST NLP, fine-tuned from Llama-1-13b-hf, designed to automatically annotate the instruction complexity of Supervised Fine-Tuning (SFT) data. This model specializes in providing a numerical complexity score for user queries, making it a crucial tool for automatic data selection in Large Language Model instruction tuning. Its primary use case is to streamline the process of curating high-quality datasets by identifying and scoring the complexity of instructions.

Loading preview...

Deita Complexity Scorer: Automatic Instruction Complexity Annotation

The Deita Complexity Scorer, developed by HKUST NLP, is a specialized 7 billion parameter model fine-tuned from Llama-1-13b-hf. It is designed to automatically annotate the instruction complexity of Supervised Fine-Tuning (SFT) data, playing a key role in the Deita project's goal of facilitating automatic data selection for Large Language Models (LLMs).

Key Capabilities

  • Automated Complexity Scoring: Assigns a numerical complexity score (1-6) to user instructions.
  • Data Selection Enhancement: Helps in curating high-quality SFT datasets by identifying and filtering instructions based on their complexity.
  • Fine-tuned Performance: Leverages a Llama-1-13b-hf base model, specifically adapted for this annotation task.

Good for

  • Researchers and developers working on instruction tuning for LLMs.
  • Automating the selection and filtering of training data based on instruction complexity.
  • Improving the efficiency and quality of SFT dataset creation.
  • Analyzing the complexity distribution of existing instruction datasets.