hkust-nlp/deita-complexity-scorer

Warm
Public
7B
FP8
4096
Oct 25, 2023
License: apache-2.0
Hugging Face
Overview

Deita Complexity Scorer: Automatic Instruction Complexity Annotation

The Deita Complexity Scorer, developed by HKUST NLP, is a specialized 7 billion parameter model fine-tuned from Llama-1-13b-hf. It is designed to automatically annotate the instruction complexity of Supervised Fine-Tuning (SFT) data, playing a key role in the Deita project's goal of facilitating automatic data selection for Large Language Models (LLMs).

Key Capabilities

  • Automated Complexity Scoring: Assigns a numerical complexity score (1-6) to user instructions.
  • Data Selection Enhancement: Helps in curating high-quality SFT datasets by identifying and filtering instructions based on their complexity.
  • Fine-tuned Performance: Leverages a Llama-1-13b-hf base model, specifically adapted for this annotation task.

Good for

  • Researchers and developers working on instruction tuning for LLMs.
  • Automating the selection and filtering of training data based on instruction complexity.
  • Improving the efficiency and quality of SFT dataset creation.
  • Analyzing the complexity distribution of existing instruction datasets.