Name: shibing624/chinese-text-correction-7b API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: shibing624

Overview

shibing624/chinese-text-correction-7b is a 7.6 billion parameter model built upon the Qwen/Qwen2.5-7B-Instruct architecture, specifically fine-tuned for Chinese text correction (CTC). This model addresses both Chinese Spelling Correction (CSC), which handles sound-alike, shape-alike, and grammar errors with length-aligned corrections, and broader CTC, which also includes multi-character or missing-character errors that are length-unaligned.

Key Capabilities

Comprehensive Chinese Text Correction: Corrects spelling, grammar, and structural errors in Chinese text.
Handles Varied Error Types: Capable of correcting both length-aligned errors (e.g., homophones, similar-looking characters) and length-unaligned errors (e.g., missing or extra characters).
Strong Performance: Achieves an average F1 score of 0.8225 across various benchmarks, notably 0.9798 on EC-LAW and 0.9959 on MCSC.
Integration: Designed to be used with the pycorrector library for easy integration into correction workflows, or directly via Hugging Face Transformers.

Training Details

Base Model: Qwen/Qwen2.5-7B-Instruct.
Dataset: Trained on the shibing624/chinese_text_correction dataset.
Parameters: Trained for 8 epochs with a batch size of 2 over 36,000 steps.

Good for

Applications requiring high-accuracy Chinese spelling and grammar correction.
Developers looking for a specialized model to improve the quality of Chinese text input or generated content.
Integration into larger NLP pipelines for pre-processing or post-processing Chinese text.

Overview

Overview

Key Capabilities

Training Details

Good for

Full Model Card (README)