Overview
Overview
tartuNLP/Llammas-base-p1-GPT-4o-human-error-mix-paragraph-GEC is a specialized grammatical error correction (GEC) model developed by tartuNLP. This model is designed to process and correct errors within entire paragraphs, rather than just individual sentences. It is part of a larger pipeline where the user's input paragraph is passed to this model (M1), which then outputs the corrected text.
Key Capabilities
- Paragraph-level Error Correction: Handles whole paragraphs as input, providing comprehensive corrections.
- Estonian Language Focus: Specifically tailored for grammatical error correction in Estonian.
- Synthetic Data Utilization: Addresses data scarcity by leveraging synthetic training data generated from proprietary large language models.
- Open-Weight Release: Released as an open-weight model with a permissive license, alongside its generated synthetic error correction and explanation data.
Good For
- Educational Applications: Ideal for learners of Estonian requiring grammatical error correction and potentially explanation generation.
- Estonian Language Processing: Useful for tasks requiring robust GEC for Estonian text.
- Research and Development: Provides open-weight models and synthetic data for further research in GEC, especially for low-resource languages.
This model is a result of research detailed in the paper "Paragraph-level Error Correction and Explanation Generation: Case Study for Estonian" by Vainikko et al. (2025).