GoLLIE-7B: Guideline-Following LLM for Information Extraction
GoLLIE-7B, developed by the HiTZ Basque Center for Language Technology, is a 7 billion parameter Large Language Model (LLM) fine-tuned from CODE-LLaMA2. Its core innovation lies in its ability to follow user-defined annotation guidelines for Information Extraction (IE) tasks, enabling zero-shot performance. Unlike traditional methods, GoLLIE-7B processes detailed definitions and schemas provided at inference time, allowing for highly flexible and dynamic IE without prior training on specific datasets.
Key Capabilities
- Zero-Shot Information Extraction: Performs IE on unseen tasks by interpreting user-defined annotation guidelines.
- Dynamic Schema Definition: Users can define extraction schemas on the fly using Python classes and docstrings, as demonstrated in the provided examples.
- Improved Performance: Outperforms previous zero-shot IE approaches by leveraging explicit guidelines.
- English Language Support: Optimized for English NLP tasks.
When to Use GoLLIE-7B
GoLLIE-7B is ideal for developers and researchers needing to extract structured information from unstructured text, especially when:
- Custom Extraction is Required: You need to define specific entities and relationships not covered by pre-trained models.
- Rapid Prototyping: Quickly set up and test information extraction pipelines without extensive dataset labeling.
- Zero-Shot Scenarios: Applying IE to domains or tasks where labeled training data is scarce or non-existent.
For detailed usage and examples, refer to the GoLLIE Notebooks.