TEI Entity Linker: Qwen3-14B LoRA Adapter
This model is a 14 billion parameter LoRA adapter for the Qwen3-14B base model, fine-tuned by Apokalyptikon to address a critical challenge in digital scholarly editing: linking named entities from historical TEI registers to authority files like Wikidata and GND. It processes entities (persons, places, organizations) and a list of potential candidates, outputting a verdict (MATCH, PARTIAL, NONE) with confidence and a brief reason in JSON format.
Key Capabilities
- Historical Entity Disambiguation: Accurately links entities by analyzing biographical data, descriptions, and context.
- Handles Historical Spelling: Recognizes and matches historical variants (e.g., Creveld → Krefeld, Coeln → Köln).
- Complex Entity Recognition: Distinguishes mythological figures from literary works, identifies ethnic groups, allegories, and personifications.
- Structured Output: Provides deterministic JSON output with verdicts, confidence scores, and reasons, suitable for automated pipelines.
- Optimized for TEI Data: Trained on real-world historical TEI editions (early modern German correspondence, classical philology registers).
Training and Performance
The model was fine-tuned using LoRA on Apple Silicon via MLX, leveraging a teacher-student approach where Claude Sonnet 4 generated high-quality labeled training data. It was trained on 7,098 examples, focusing on single-entity prompts for optimal performance. The model demonstrates strong capabilities in:
- Biographical disambiguation: Correctly distinguishing individuals with the same name based on life dates.
- Geographic disambiguation: Differentiating locations with similar names (e.g., Dillingen an der Donau vs. Dillingen/Saar).
- Robust Rejection: Effectively returns
NONE for underspecified or non-matching entries.
Good For
- Digital Humanities Projects: Specifically designed for researchers and developers working with historical TEI editions.
- Automated Entity Linking: Integrating into pipelines for verifying and enriching named entities in historical texts.
- Authority File Integration: Bridging historical documents with modern authority databases like Wikidata and GND.
This adapter requires an Apple Silicon Mac for MLX inference and approximately 10 GB RAM for the 4-bit model.