small-models-for-glam/index-card-extractor-4b-v0.1
The index-card-extractor-4b-v0.1 is a 4.5 billion parameter open vision-language model developed by small-models-for-glam, built on NuExtract-3 (Qwen3.5-4B base). It is fine-tuned to convert historical index-card images into structured JSON based on a user-provided schema, including schemas it was not explicitly trained on. This model excels at domain-adapted extraction from archival card collections, supporting cross-lingual and cross-domain applications for libraries, archives, and museums.
Loading preview...
Overview
index-card-extractor-4b-v0.1 is a 4.5 billion parameter open vision-language model, fine-tuned from NuExtract-3 (Qwen3.5-4B base). It specializes in extracting structured JSON data from images of historical index cards, such as catalogue, vital-record, and manuscript cards. A key feature is its ability to follow user-defined JSON schemas at inference time, even for schemas it has not encountered during training.
Key Capabilities
- Schema-driven Extraction: Converts card images into structured JSON according to a provided JSON template or Pydantic schema.
- Domain Adaptation: Strong domain knowledge for handwritten and typed archival cards, including French and English death records and English manuscript-catalogue cards.
- Schema Generalization: Demonstrates 100% valid, schema-conforming JSON output on unseen schemas and collections.
- Performance: Achieves an exact field-F1 of 0.887 on Teklia (FR handwritten deaths) and a manuscript-number F1 of 0.952 on NLS Advocates, outperforming NuExtract-3 zero-shot and, in some cases, Qwen3-VL-8B.
Intended Use
This model is designed for libraries, archives, and museums to digitize card catalogues and index drawers into structured, ingestible records. It is best used as a first-pass extractor with human review rather than for generating production-ready ground truth without verification.
Limitations
- Training data for two of three collections used machine-generated silver labels, which can introduce quality ceilings for free-text fields.
- Handwriting recognition remains challenging, particularly for place names and long free-text fields.
- Test sets are small, so reported numbers should be considered directional.
- Greedy / non-thinking decoding is recommended, as reasoning mode was not trained for this task.