Overview
Deita Complexity Scorer: Automatic Instruction Complexity Annotation
The Deita Complexity Scorer, developed by HKUST NLP, is a specialized 7 billion parameter model fine-tuned from Llama-1-13b-hf. It is designed to automatically annotate the instruction complexity of Supervised Fine-Tuning (SFT) data, playing a key role in the Deita project's goal of facilitating automatic data selection for Large Language Models (LLMs).
Key Capabilities
- Automated Complexity Scoring: Assigns a numerical complexity score (1-6) to user instructions.
- Data Selection Enhancement: Helps in curating high-quality SFT datasets by identifying and filtering instructions based on their complexity.
- Fine-tuned Performance: Leverages a Llama-1-13b-hf base model, specifically adapted for this annotation task.
Good for
- Researchers and developers working on instruction tuning for LLMs.
- Automating the selection and filtering of training data based on instruction complexity.
- Improving the efficiency and quality of SFT dataset creation.
- Analyzing the complexity distribution of existing instruction datasets.