InsTagger by OFA-Sys is a 7 billion parameter auto-regressive model, fine-tuned from LLaMA-2, designed for automatically providing instruction tags to queries in supervised fine-tuning (SFT) data. It distills tagging results from the InsTag system, which analyzes SFT data for LLM alignment with human preferences. This model is specifically optimized for data preparation in LLM alignment, enabling the creation of high-quality instruction datasets.
Loading preview...
What is OFA-Sys/InsTagger?
InsTagger is an auto-regressive model developed by OFA-Sys, fine-tuned from LLaMA-2, with 7 billion parameters. Its primary function is to automatically assign instruction tags to queries within supervised fine-tuning (SFT) datasets. This process is a distillation of the tagging results from the original InsTag system, which focuses on analyzing SFT data to enhance LLM alignment with human preferences.
Key Capabilities
- Automated Instruction Tagging: Efficiently tags queries in SFT data, streamlining the data preparation phase for LLM training.
- Data Distillation: Leverages the insights from the more complex InsTag system to provide accessible local tagging capabilities.
- LLM Alignment Support: Facilitates the creation of high-quality, preference-aligned SFT datasets, which have been shown to improve LLM performance (e.g., TagLM models outperforming other open-source LLMs on MT-Bench).
Good For
- SFT Data Preparation: Ideal for developers and researchers looking to process and tag large volumes of SFT data for training or fine-tuning large language models.
- Improving LLM Alignment: Useful for enhancing the quality and human preference alignment of instruction datasets.
- FastChat Integration: Directly compatible with FastChat for easy inference and serving using the Vicuna template.