HiTZ/GoLLIE-7B

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Sep 25, 2023License:llama2Architecture:Transformer0.0K Open Weights Cold

HiTZ/GoLLIE-7B is a 7 billion parameter Large Language Model developed by HiTZ Basque Center for Language Technology, fine-tuned from CODE-LLaMA2. It is specifically designed for Information Extraction tasks, excelling at following annotation guidelines provided on the fly. This model allows users to perform zero-shot information extraction with custom-defined schemas, outperforming previous approaches by leveraging detailed definitions rather than relying solely on pre-encoded LLM knowledge.

Loading preview...

GoLLIE-7B: Guideline-Following LLM for Information Extraction

GoLLIE-7B, developed by the HiTZ Basque Center for Language Technology, is a 7 billion parameter Large Language Model (LLM) fine-tuned from CODE-LLaMA2. Its core innovation lies in its ability to follow user-defined annotation guidelines for Information Extraction (IE) tasks, enabling zero-shot performance. Unlike traditional methods, GoLLIE-7B processes detailed definitions and schemas provided at inference time, allowing for highly flexible and dynamic IE without prior training on specific datasets.

Key Capabilities

  • Zero-Shot Information Extraction: Performs IE on unseen tasks by interpreting user-defined annotation guidelines.
  • Dynamic Schema Definition: Users can define extraction schemas on the fly using Python classes and docstrings, as demonstrated in the provided examples.
  • Improved Performance: Outperforms previous zero-shot IE approaches by leveraging explicit guidelines.
  • English Language Support: Optimized for English NLP tasks.

When to Use GoLLIE-7B

GoLLIE-7B is ideal for developers and researchers needing to extract structured information from unstructured text, especially when:

  • Custom Extraction is Required: You need to define specific entities and relationships not covered by pre-trained models.
  • Rapid Prototyping: Quickly set up and test information extraction pipelines without extensive dataset labeling.
  • Zero-Shot Scenarios: Applying IE to domains or tasks where labeled training data is scarce or non-existent.

For detailed usage and examples, refer to the GoLLIE Notebooks.