GoLLIE-13B: Guideline-following LLM for Information Extraction
GoLLIE-13B is a 13 billion parameter Large Language Model developed by HiTZ Basque Center for Language Technology, specifically engineered for zero-shot Information Extraction (IE). Unlike conventional LLMs, GoLLIE is trained to interpret and follow explicit annotation guidelines, allowing users to define extraction schemas dynamically.
Key Capabilities
- Guideline-driven Information Extraction: Processes text to extract structured information based on user-defined annotation schemas, provided as Python classes with docstrings.
- Zero-shot Performance: Achieves strong performance in information extraction tasks without requiring task-specific examples, relying instead on detailed guidelines.
- Flexible Schema Definition: Users can define custom entities and relationships on the fly, making it adaptable to diverse IE needs.
- Fine-tuned from CODE-LLaMA2: Leverages the robust foundation of CODE-LLaMA2, indicating potential for understanding structured inputs.
Good for
- Developers needing to extract specific, structured data from unstructured text using custom, evolving schemas.
- Applications requiring flexible and adaptable information extraction without extensive re-training for new tasks.
- Research and development in advanced zero-shot learning for IE, where explicit guideline adherence is critical.