numind/NuExtract
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:4kPublished:May 31, 2024License:mitArchitecture:Transformer0.2K Open Weights Warm

NuExtract by NuMind is a 4 billion parameter language model, fine-tuned from Microsoft's Phi-3-mini-4k-instruct, specifically designed for structured information extraction. It excels at extracting data from text into a specified JSON format, ensuring all extracted text is present in the original input. This model is purely extractive and supports a 4096 token context length, making it suitable for precise data retrieval tasks.

Loading preview...

NuExtract: Specialized Information Extraction

NuExtract, developed by NuMind, is a 4 billion parameter model built upon Microsoft's Phi-3-mini-4k-instruct architecture. It is specifically fine-tuned on a high-quality synthetic dataset to perform structured information extraction from text.

Key Capabilities

  • Purely Extractive: Guarantees that all extracted text is directly present in the original input, preventing hallucination of information.
  • JSON Template-Driven: Users provide a JSON template to define the desired output structure, enabling precise and customizable data extraction.
  • Example-Based Guidance: Supports providing output formatting examples to further refine extraction accuracy for complex tasks.
  • Context Length: Processes inputs up to 4096 tokens, suitable for various document lengths.

Good For

  • Structured Data Extraction: Ideal for converting unstructured text into structured JSON formats.
  • Automated Data Processing: Useful for tasks requiring the precise retrieval of specific entities or facts from documents.
  • Customizable Extraction: Adapts to diverse extraction needs through user-defined JSON schemas.

NuMind also offers smaller (0.5B) and larger (7B) versions of this model, NuExtract-tiny and NuExtract-large, respectively, to cater to different computational and performance requirements.