tartuNLP/Llammas-base-p1-GPT-4o-human-error-mix-paragraph-GEC

Warm
Public
7B
FP8
4096
Feb 11, 2025
License: llama2
Hugging Face
Overview

Overview

tartuNLP/Llammas-base-p1-GPT-4o-human-error-mix-paragraph-GEC is a specialized grammatical error correction (GEC) model developed by tartuNLP. This model is designed to process and correct errors within entire paragraphs, rather than just individual sentences. It is part of a larger pipeline where the user's input paragraph is passed to this model (M1), which then outputs the corrected text.

Key Capabilities

  • Paragraph-level Error Correction: Handles whole paragraphs as input, providing comprehensive corrections.
  • Estonian Language Focus: Specifically tailored for grammatical error correction in Estonian.
  • Synthetic Data Utilization: Addresses data scarcity by leveraging synthetic training data generated from proprietary large language models.
  • Open-Weight Release: Released as an open-weight model with a permissive license, alongside its generated synthetic error correction and explanation data.

Good For

  • Educational Applications: Ideal for learners of Estonian requiring grammatical error correction and potentially explanation generation.
  • Estonian Language Processing: Useful for tasks requiring robust GEC for Estonian text.
  • Research and Development: Provides open-weight models and synthetic data for further research in GEC, especially for low-resource languages.

This model is a result of research detailed in the paper "Paragraph-level Error Correction and Explanation Generation: Case Study for Estonian" by Vainikko et al. (2025).