Overview
GoLLIE: Guideline-following Large Language Model for Information Extraction
GoLLIE (Guideline-following Large Language Model for Information Extraction) is a 13 billion parameter model developed by the HiTZ Basque Center for Language Technology. Fine-tuned from CODE-LLaMA2, its core innovation lies in its ability to perform zero-shot Information Extraction by strictly adhering to user-defined annotation guidelines. Unlike other models that depend on inherent knowledge, GoLLIE processes detailed definitions and instructions provided in the form of Python classes and docstrings to extract information.
Key Capabilities
- Zero-Shot Information Extraction: Outperforms previous approaches in extracting structured data without prior examples for specific tasks.
- Dynamic Schema Definition: Users can define annotation schemas on the fly using Python classes and docstrings, offering high flexibility.
- Guideline Adherence: Relies on explicit guidelines for extraction, making it robust to novel or complex information extraction tasks.
- Performance: The 13B variant achieves a Zero-shot average F1 score of 56.0, demonstrating strong performance in guideline-driven extraction.
Good For
- Custom Information Extraction: Ideal for scenarios requiring extraction of specific entities or relations where predefined schemas are not available or need frequent modification.
- Research and Development: Useful for exploring new information extraction tasks and methodologies due to its flexible guideline-following approach.
- Structured Data Generation: Can be leveraged to transform unstructured text into structured formats based on user-defined rules.