surogate/Qwen3-1.7B-Libra-MF
surogate/Qwen3-1.7B-Libra-MF is a 1.7 billion parameter Qwen3-based language model fine-tuned by Surogate to extract column-mapping recipes from Romanian fixed-asset registers. This specialized model processes varied register layouts, including grouped and column-based formats, and outputs structured JSON indicating column roles for accounting totals. It is designed to accurately identify and map six critical accounting fields, addressing common ambiguities that challenge general-purpose LLMs.
Loading preview...
Model Overview
surogate/Qwen3-1.7B-Libra-MF is a specialized 1.7 billion parameter model, fine-tuned from Qwen/Qwen3-1.7B, designed to process Romanian fixed-asset registers. Its primary function is to read raw, often inconsistent, register text and generate a structured JSON "column-mapping recipe." This recipe identifies the column index and header text for eight specific accounting roles, enabling a deterministic post-processor to calculate six key accounting totals per asset category.
Key Capabilities & Differentiators
- Specialized Data Extraction: Accurately maps columns for
Valoare intrare,Valoare modernizări,Valoare de inventar,Valoare amortizată,Amortizare lunară, andValoare rămasăfrom diverse Romanian fixed-asset register layouts. - Handles Layout Variance: Robustly processes registers with differing headers, grouped vs. column-based structures, "trap columns" (e.g., monthly vs. cumulative depreciation), and challenging formats like headerless or OCR-mangled text.
- Addresses General LLM Failures: Specifically engineered to overcome common errors observed in general GPT-4-class models when dealing with these complex accounting documents, such as confusing monthly vs. cumulative depreciation or misidentifying value columns.
- Output Schema: Emits a JSON object detailing column mappings, including header text and 0-based index, and indicates whether the register is 'grouped' (
cont = null) or 'column-based' (contpresent).
Performance
The model demonstrates strong performance on specialized tasks:
- Real Client Registers: Achieved 7/7 (100%) accuracy on end-to-end 6-field totals for real client registers.
- Held-out Synthetic Data: Scored 95.3% on 360 held-out synthetic examples across 12 formats.
- Validation Set: Achieved 92.6% accuracy on 600 validation examples for model recipe vs. ground-truth recipe.
Limitations
- Romanian Only: Primarily designed for Romanian registers, with limited robustness for fully English inputs.
- Specific Categories: Focuses on fixed-asset accounts (categories 205 to 215).
- Input Length: Optimized for inputs up to 2048 tokens; very long registers should be windowed.
- Post-processor Required: The model outputs a recipe; a separate post-processor is needed to compute final accounting totals.