UniNER-7B-type Overview

UniNER-7B-type is a 7 billion parameter model, derived from Llama-7B, specialized in Universal Named Entity Recognition (NER). Its primary strength lies in extracting entities and classifying their types from given text passages. The model was trained on the unique Pile-NER-type dataset, which was synthetically generated by leveraging GPT-3.5-turbo-0301 to label entities and provide corresponding tags, eliminating the need for human-annotated data.

Key Capabilities and Differentiators

Universal NER Performance: UniNER-7B-type demonstrates strong performance on the Universal NER benchmark, which encompasses 43 academic datasets spanning nine diverse domains.
Entity Tag Handling: It is specifically optimized for scenarios requiring the identification and classification of entity tags within text.
Synthetic Data Training: The model's training methodology, utilizing GPT-3.5-turbo-0301 for data generation, highlights an innovative approach to data collection for NER tasks.
Comparison to UniNER-7B-definition: While UniNER-7B-type excels with entity tags and broad NER, its counterpart, UniNER-7B-definition, is better suited for processing entity types defined by short sentences and offers greater robustness to type paraphrasing.

Use Cases and Inference

This model is ideal for research purposes focused on named entity recognition where the goal is to extract entities and their types. Inference is performed by providing a text and querying for a specific entity type, with the model returning predictions in JSON format. It requires separate queries for each entity type when multiple types are desired.

Overview

UniNER-7B-type Overview

Key Capabilities and Differentiators

Use Cases and Inference

Full Model Card (README)