muthugsubramanian/DocWain-14B-v2-unified-dpo
DocWain-14B-v2-unified-dpo is a 14 billion parameter enterprise document intelligence agent, built on a vision-grafted Qwen3-14B base model. This FP16 DPO-refined variant is specifically designed for accurate extraction, analysis, comparison, and grounded response generation from various enterprise document types. It features baked-in identity and behavioral discipline, including verbatim quoting and refusal to fabricate information, making it distinct for document-centric applications.
Loading preview...
DocWain-14B-v2-unified-dpo: Enterprise Document Intelligence
DocWain-14B-v2-unified-dpo is a 14 billion parameter model, refined with DPO from an SFT-unified base, built upon a vision-grafted Qwen3-14B architecture. It functions as an enterprise document intelligence agent, specializing in extraction, analysis, comparison, and grounded response generation from user-uploaded document profiles. A key differentiator is its "unified" nature, with identity, capability awareness, and behavioral discipline directly integrated into its weights through a focused LoRA SFT finetune on synthetic data.
Key Capabilities
- Accurate Extraction: Excels at extracting information from diverse enterprise documents like invoices, contracts, resumes, and research papers.
- Document Intelligence: Provides summaries, identifies key findings, surfaces cross-document relationships, and detects anomalies.
- Context Understanding: Interprets layout, tables, charts, and multi-page references.
- Grounded Response Generation: Generates responses with verbatim quoting from evidence and explicitly states "not specified in the documents" when information is absent, preventing fabrication.
- Document Generation: Capable of creating structured reports, comparison tables, and executive briefs derived from source documents.
- Baked-in Behavior: Self-identifies as DocWain, preserves currency symbols (₹/£/$), and refuses to add unverified skills/education/experience.
Training and Usage
The model was trained exclusively on synthetic data, ensuring no customer or scraped private data was used. The training corpus includes identity/persona examples, capability awareness Q&A, synthetic document snippets paired with ideal grounded responses, and domain-mismatch refusal examples. A short system prompt is sufficient for runtime, as its identity and core behaviors are embedded in its weights. Recommended runtimes include FP16 on A100 80GB, AWQ INT4 with vLLM, and GGUF Q5_K_M or Q4_K_M with Ollama/llama.cpp.