KaraKaraWitch/HiTZ-GoLLIE-13B-AsSafeTensors

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kLicense:llama2Architecture:Transformer Open Weights Warm

HiTZ-GoLLIE-13B-AsSafeTensors is a 13 billion parameter text generation model developed by HiTZ Basque Center for Language Technology, fine-tuned from CODE-LLaMA2. This model is specifically designed for zero-shot Information Extraction by following annotation guidelines, allowing users to define schemas on the fly. It excels at extracting structured information from text based on detailed, user-provided definitions rather than relying solely on pre-encoded LLM knowledge.

Loading preview...

GoLLIE: Guideline-following Large Language Model for Information Extraction

GoLLIE (Guideline-following Large Language Model for Information Extraction) is a 13 billion parameter model developed by the HiTZ Basque Center for Language Technology. Fine-tuned from CODE-LLaMA2, its core innovation lies in its ability to perform zero-shot Information Extraction by strictly adhering to user-defined annotation guidelines. Unlike other models that depend on inherent knowledge, GoLLIE processes detailed definitions and instructions provided in the form of Python classes and docstrings to extract information.

Key Capabilities

  • Zero-Shot Information Extraction: Outperforms previous approaches in extracting structured data without prior examples for specific tasks.
  • Dynamic Schema Definition: Users can define annotation schemas on the fly using Python classes and docstrings, offering high flexibility.
  • Guideline Adherence: Relies on explicit guidelines for extraction, making it robust to novel or complex information extraction tasks.
  • Performance: The 13B variant achieves a Zero-shot average F1 score of 56.0, demonstrating strong performance in guideline-driven extraction.

Good For

  • Custom Information Extraction: Ideal for scenarios requiring extraction of specific entities or relations where predefined schemas are not available or need frequent modification.
  • Research and Development: Useful for exploring new information extraction tasks and methodologies due to its flexible guideline-following approach.
  • Structured Data Generation: Can be leveraged to transform unstructured text into structured formats based on user-defined rules.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p