numind/NuExtract

Warm
Public
4B
BF16
4096
License: mit
Hugging Face
Overview

NuExtract: Specialized Information Extraction

NuExtract, developed by NuMind, is a 4 billion parameter model built upon Microsoft's Phi-3-mini-4k-instruct architecture. It is specifically fine-tuned on a high-quality synthetic dataset to perform structured information extraction from text.

Key Capabilities

  • Purely Extractive: Guarantees that all extracted text is directly present in the original input, preventing hallucination of information.
  • JSON Template-Driven: Users provide a JSON template to define the desired output structure, enabling precise and customizable data extraction.
  • Example-Based Guidance: Supports providing output formatting examples to further refine extraction accuracy for complex tasks.
  • Context Length: Processes inputs up to 4096 tokens, suitable for various document lengths.

Good For

  • Structured Data Extraction: Ideal for converting unstructured text into structured JSON formats.
  • Automated Data Processing: Useful for tasks requiring the precise retrieval of specific entities or facts from documents.
  • Customizable Extraction: Adapts to diverse extraction needs through user-defined JSON schemas.

NuMind also offers smaller (0.5B) and larger (7B) versions of this model, NuExtract-tiny and NuExtract-large, respectively, to cater to different computational and performance requirements.