Oolel-Corrector: Wolof Orthography Normalization
Oolel-Corrector is a 1.5 billion parameter model developed by soynade-research, fine-tuned from Oolel-Small-v0.1. Its primary function is to normalize informal Wolof text, commonly found on social media platforms, into standard written orthography. Wolof, being largely an oral language, often lacks consistent written standards, with text frequently influenced by phonetic spellings and French conventions. This model specifically targets these inconsistencies, correcting vowels with diacritics, restoring geminates, and fixing French-influenced spellings.
Key Capabilities
- Orthography Correction: Converts informal Wolof spellings to formal, standard orthography.
- Code-Switching Preservation: Leaves code-switched French and English segments within the text untouched.
- Preprocessing: Ideal for cleaning raw Wolof social media data for downstream NLP tasks.
Good For
- Dataset Creation and Cleaning: Normalizing large corpora of informal Wolof text for training, annotation, or evaluation of other NLP systems.
- Processing Layers: Serving as a crucial preprocessing step for pipelines involving Wolof text, such as sentiment analysis, topic classification, or machine translation, to ensure consistent input.
- Keyboard and Writing Tools: Integration into applications to provide real-time standardized spelling suggestions for users writing in Wolof.
Limitations
- Correction Only: The model performs orthography normalization and does not translate text. For translation of informal Wolof, it must be paired with a separate translation model after normalization.