Overview
NuExtract-tiny: Specialized Information Extraction Model
NuExtract-tiny, developed by NuMind, is a 0.6 billion parameter model built upon the Qwen1.5-0.5B architecture. It has been meticulously fine-tuned on a proprietary, high-quality synthetic dataset, making it highly effective for structured information extraction from text.
Key Capabilities
- Purely Extractive: Guarantees that all extracted information is directly present in the original input text, preventing hallucination.
- Template-Driven Extraction: Users can define the desired output structure using a JSON template, guiding the model to extract specific fields.
- Example-Based Guidance: Supports providing output formatting examples to further refine extraction accuracy for complex tasks.
- Efficient Processing: Designed to handle input texts up to 2000 tokens, with a maximum context length of 32768 tokens.
Good For
- Zero-shot Information Extraction: Provides good performance out-of-the-box for various extraction tasks.
- Task-Specific Fine-tuning: Intended for further fine-tuning on specific use cases with as few as 30 examples to achieve optimal performance.
- Structured Data Retrieval: Ideal for converting unstructured text into structured JSON outputs based on predefined schemas.