Overview
NuExtract: Specialized Information Extraction
NuExtract, developed by NuMind, is a 4 billion parameter model built upon Microsoft's Phi-3-mini-4k-instruct architecture. It is specifically fine-tuned on a high-quality synthetic dataset to perform structured information extraction from text.
Key Capabilities
- Purely Extractive: Guarantees that all extracted text is directly present in the original input, preventing hallucination of information.
- JSON Template-Driven: Users provide a JSON template to define the desired output structure, enabling precise and customizable data extraction.
- Example-Based Guidance: Supports providing output formatting examples to further refine extraction accuracy for complex tasks.
- Context Length: Processes inputs up to 4096 tokens, suitable for various document lengths.
Good For
- Structured Data Extraction: Ideal for converting unstructured text into structured JSON formats.
- Automated Data Processing: Useful for tasks requiring the precise retrieval of specific entities or facts from documents.
- Customizable Extraction: Adapts to diverse extraction needs through user-defined JSON schemas.
NuMind also offers smaller (0.5B) and larger (7B) versions of this model, NuExtract-tiny and NuExtract-large, respectively, to cater to different computational and performance requirements.