singtan/solvrays-finetuned-pdf

TEXT GENERATIONConcurrency Cost:1Model Size:2.5BQuant:BF16Ctx Length:8kPublished:Apr 30, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The singtan/solvrays-finetuned-pdf model, developed by Bibek Lama Singtan, is a 2.5 billion parameter Gemma 2B base model meticulously fine-tuned for complex document understanding and technical metadata extraction. This standalone version features merged weights for zero-overhead inference, optimized for processing technical PDF structures like infrastructure guides and architectural documentation. It excels at high-precision extraction and summary tasks from technical corpora, trained on text recovered via a hybrid Digital/OCR pipeline. With an 8192 token context length, it is designed for seamless integration into production pipelines requiring specialized document intelligence.

Loading preview...

Solvrays Finetuned Pdf: Specialized Document Intelligence

This model is a high-performance, standalone version of Gemma 2B (2.5B parameters), developed by Bibek Lama Singtan and specifically fine-tuned for complex document understanding and technical metadata extraction. Unlike standard PEFT adapters, it features merged weights, allowing for zero-overhead inference as a native CausalLM, which simplifies integration into production environments.

Key Capabilities

  • Document Intelligence: Optimized for technical PDF structures, including infrastructure guides and architectural documentation.
  • High-Fidelity Data Processing: Trained on text recovered through a hybrid Digital/OCR pipeline, ensuring maximum data fidelity.
  • Optimized Context: Tailored for high-precision extraction and summary tasks from technical corpora.
  • Seamless Deployment: Merged weights enable direct loading without separate adapter layers.

Training & Limitations

The model was trained using QLoRA (4-bit quantization) on the google/gemma-2b base model, followed by FP16 weight merging over 3 epochs. While highly effective for technical documentation, it is a generative LLM and may produce hallucinations if the input context is ambiguous. For critical data extraction, Retrieval-Augmented Generation (RAG) or strict prompting is recommended. The model operates under the Apache-2.0 license, adhering to the Google Gemma Prohibited Use Policy.