chengyewang/TexOCR-RL

VISIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Apr 21, 2026License:cc-by-4.0Architecture:Transformer Open Weights Cold

TexOCR-RL by chengyewang is a 2 billion parameter model based on TexOCR_OCR, fine-tuned with GRPO reinforcement learning. It specializes in converting document page images into compilable LaTeX code. This model is designed for accurate page-to-LaTeX reconstruction, offering a unique solution for digitizing documents into structured, editable formats.

Loading preview...

Overview

TexOCR-RL is a 2 billion parameter model developed by chengyewang, built upon the TexOCR_OCR base model. Its primary function is to reconstruct document pages into compilable LaTeX code. This model leverages GRPO (Reinforcement Learning) for its training methodology, distinguishing it from standard OCR approaches by focusing on the structural and compilable nature of the output.

Key Capabilities

  • Image-to-LaTeX Conversion: Directly transforms document page images into LaTeX source code.
  • Compilable Output: Emphasizes generating LaTeX that is syntactically correct and can be compiled.
  • Reinforcement Learning Optimization: Utilizes GRPO for enhanced performance in LaTeX reconstruction.
  • Based on Qwen3-VL-2B-Instruct: The underlying architecture is a vision-language model, enabling robust image understanding.

Good For

  • Digitizing Scientific Papers: Ideal for converting scanned academic documents or PDFs into editable LaTeX.
  • Document Archiving: Creating structured, machine-readable versions of image-based documents.
  • Automated LaTeX Generation: Applications requiring the programmatic creation of LaTeX from visual input.