muthugsubramanian/DocWain-14B-v2-unified
DocWain-14B-v2-unified by muthugsubramanian is a 14 billion parameter enterprise document intelligence agent, built on a vision-grafted Qwen3-14B base model, with a 32768 token context length. This model is specifically fine-tuned for accurate extraction, analysis, comparison, and grounded response generation from various enterprise document types. It features baked-in identity and behavioral discipline, ensuring verbatim quoting, refusal on missing data, and currency preservation, making it ideal for robust document processing applications.
Loading preview...
DocWain-14B-v2-unified: Enterprise Document Intelligence Agent
DocWain-14B-v2-unified is a 14 billion parameter model developed by muthugsubramanian, designed as an enterprise document intelligence agent. Built upon a vision-grafted Qwen3-14B base, this unified variant features a 32768 token context length and is specifically fine-tuned with a LoRA SFT approach on synthetic data to embed identity, capability awareness, and strict behavioral discipline directly into its weights.
Key Capabilities
- Accurate Extraction: Excels at extracting information from diverse enterprise documents like invoices, contracts, resumes, policies, and research papers.
- Document Intelligence: Provides summaries, identifies key findings, uncovers cross-document relationships, and surfaces anomalies.
- Layout and Context Understanding: Comprehends complex document structures including tables, charts, and multi-page references.
- Grounded Response Generation: Generates responses with verbatim quoting from evidence and explicitly states "not specified in the documents" when information is absent, preventing fabrication.
- Behavioral Discipline: Maintains currency symbols (e.g., ₹/£/$) and refuses to hallucinate skills or experience not present in source documents.
- Document Generation: Capable of producing structured reports, comparison tables, and executive briefs derived from user-provided documents.
Training and Uniqueness
The model was trained exclusively on synthetic data, ensuring no customer or scraped private data was used. This includes identity/persona examples, capability awareness Q&A, synthetic document snippets paired with ideal grounded responses, and domain-mismatch refusal examples. Its unique baked-in identity means it self-identifies as DocWain regardless of the system prompt, and its fine-tuned behavior ensures consistent, reliable, and fact-grounded outputs, making it highly suitable for sensitive enterprise document workflows.