melsmm/Spell-Corrector-RU-4B
The melsmm/Spell-Corrector-RU-4B is a 4 billion parameter language model, fine-tuned by melsmm on RefalMachine/RuadaptQwen3-4B-Instruct, specifically designed for automatic correction of spelling, punctuation, and case errors in Russian texts. This model excels at restoring punctuation, correcting orthographic mistakes, and adjusting letter casing, offering significantly faster inference speeds compared to T5-based alternatives. Its primary use case is efficient and accurate Russian text correction, particularly in production environments via an OpenAI-compatible API.
Loading preview...
Overview
melsmm/Spell-Corrector-RU-4B is a 4 billion parameter language model, built upon RefalMachine/RuadaptQwen3-4B-Instruct (a Russian-adapted version of Qwen/Qwen3-4B-Instruct-2507). It has been fine-tuned using LoRA to specialize in correcting various errors in Russian text. The LoRA adapter is already merged, allowing direct use of the model.
Key Capabilities
- Orthographic Correction: Fixes spelling mistakes and typos.
- Punctuation Restoration: Corrects and restores punctuation.
- Case Correction: Adjusts letter casing (uppercase/lowercase) and the letter "ё".
Training Details
The model was trained in two stages:
- Stage 1: On approximately 1 million synthetic errors introduced into clean Russian corpora (nerus, gazeta, wikipedia) using SAGE methods and a custom punctuation corruption algorithm.
- Stage 2: On about 30,000 high-quality "error → correction" pairs from open datasets like RUSpellRU, MultidomainGold, and GEC.
Performance and Differentiators
While Spell-Corrector-RU-4B may show slightly lower F1 scores for spelling on some benchmarks compared to SAGE (T5-based models), it demonstrates competitive performance in punctuation and case correction. Its main advantage lies in its high inference speed, being approximately 6 times faster than sage-fredt5-large when deployed via vLLM, making it suitable for production environments. It also offers ease of integration through an OpenAI-compatible API and simplified domain adaptation via LoRA.
Limitations
- May be outperformed by SAGE in orthography on certain domains.
- Quality can decrease on highly specialized domains (e.g., medical texts), where further LoRA fine-tuning is recommended.
- Designed exclusively for the Russian language.