Name: stukenov/sozkz-fix-qwen-500m-kk-gec-v4 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: stukenov

SozKZ Fix Qwen 500M — Kazakh GEC v4 Overview

This model, developed by stukenov, is a 447 million parameter Kazakh grammatical error correction (GEC) model. It is the fourth version in its series, building upon stukenov/sozkz-fix-qwen-500m-kk-gec-v3.

Key Differentiators & Capabilities

KTO Preference Optimization: Utilizes Kahneman-Tversky Optimization (KTO) on 26,404 preference pairs to learn output preferences, specifically improving punctuation handling beyond standard supervised fine-tuning.
Enhanced Punctuation Correction: Significantly improves comma insertion before conjunctions and after introductory words, as well as period placement, through dedicated training data.
Grammar and Word Usage Correction: Addresses general grammatical and word usage errors in Kazakh text.
Pipeline Integration: Designed for optimal performance when used with an external 'emle' (spelling) pre-fixer, which handles character substitution errors, allowing the GEC model to focus on grammar and punctuation.

Use Cases

Automated Kazakh Text Correction: Ideal for applications requiring automated correction of grammatical, punctuation, and word usage errors in Kazakh language content.
Improving Text Quality: Useful for enhancing the readability and correctness of written Kazakh, particularly in contexts where precise punctuation is critical.

Limitations

Standalone 'emle' (spelling) accuracy is reduced; it requires the external 'emle' pipeline for comprehensive spelling correction.
The model's standalone performance on a custom GEC benchmark is 5%, indicating its specialized role within a broader correction pipeline rather than as a standalone general-purpose fixer.

Overview

SozKZ Fix Qwen 500M — Kazakh GEC v4 Overview

Key Differentiators & Capabilities

Use Cases

Limitations

Full Model Card (README)