muthugsubramanian/DocWain-14B-v2-unified

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:14BQuant:FP8Ctx Length:32kPublished:Apr 29, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

DocWain-14B-v2-unified by muthugsubramanian is a 14 billion parameter enterprise document intelligence agent, built on a vision-grafted Qwen3-14B base model, with a 32768 token context length. This model is specifically fine-tuned for accurate extraction, analysis, comparison, and grounded response generation from various enterprise document types. It features baked-in identity and behavioral discipline, ensuring verbatim quoting, refusal on missing data, and currency preservation, making it ideal for robust document processing applications.

Loading preview...

DocWain-14B-v2-unified: Enterprise Document Intelligence Agent

DocWain-14B-v2-unified is a 14 billion parameter model developed by muthugsubramanian, designed as an enterprise document intelligence agent. Built upon a vision-grafted Qwen3-14B base, this unified variant features a 32768 token context length and is specifically fine-tuned with a LoRA SFT approach on synthetic data to embed identity, capability awareness, and strict behavioral discipline directly into its weights.

Key Capabilities

  • Accurate Extraction: Excels at extracting information from diverse enterprise documents like invoices, contracts, resumes, policies, and research papers.
  • Document Intelligence: Provides summaries, identifies key findings, uncovers cross-document relationships, and surfaces anomalies.
  • Layout and Context Understanding: Comprehends complex document structures including tables, charts, and multi-page references.
  • Grounded Response Generation: Generates responses with verbatim quoting from evidence and explicitly states "not specified in the documents" when information is absent, preventing fabrication.
  • Behavioral Discipline: Maintains currency symbols (e.g., ₹/£/$) and refuses to hallucinate skills or experience not present in source documents.
  • Document Generation: Capable of producing structured reports, comparison tables, and executive briefs derived from user-provided documents.

Training and Uniqueness

The model was trained exclusively on synthetic data, ensuring no customer or scraped private data was used. This includes identity/persona examples, capability awareness Q&A, synthetic document snippets paired with ideal grounded responses, and domain-mismatch refusal examples. Its unique baked-in identity means it self-identifies as DocWain regardless of the system prompt, and its fine-tuned behavior ensures consistent, reliable, and fact-grounded outputs, making it highly suitable for sensitive enterprise document workflows.