numind/NuExtract-1.5
NuExtract-1.5 by NuMind is a 4 billion parameter language model, fine-tuned from Phi-3.5-mini-instruct, specifically for structured information extraction. It excels at extracting data from long documents (up to 20k tokens) and supports multiple languages including English, French, Spanish, German, Portuguese, and Italian. The model prioritizes pure extraction, ensuring generated text is present in the original source, making it ideal for precise data retrieval tasks.
Loading preview...
NuExtract-1.5: Multilingual Structured Information Extraction
NuExtract-1.5, developed by NuMind, is a 4 billion parameter model fine-tuned from Phi-3.5-mini-instruct. Its core strength lies in structured information extraction from text, supporting a JSON template-based approach to define desired output.
Key Capabilities
- Multilingual Support: Extracts information from documents in English, French, Spanish, German, Portuguese, and Italian.
- Long Document Handling: Capable of processing documents up to 20,000 tokens, utilizing a sliding window prompting technique for very long contexts.
- Pure Extraction Focus: Designed to ensure that all extracted text is directly present in the original input, minimizing hallucination and ensuring factual accuracy.
- High Performance: Benchmarks demonstrate strong zero-shot performance across both English and multilingual extraction tasks, as well as effective handling of long contexts.
Good For
- Developers needing to extract specific, structured data from diverse textual sources.
- Applications requiring high precision in information retrieval where output must strictly adhere to the input text.
- Use cases involving processing lengthy documents or multilingual content for data extraction.
NuMind recommends using a temperature setting at or near 0 for optimal extraction results.