Name: lunahr/CeluneNorm-0.6B-v1.2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: lunahr

Overview

CeluneNorm-0.6B-v1.2 is a compact 0.8 billion parameter text normalization model developed by lunahr, based on Qwen3-0.6B-Base. Its primary function is to clean and standardize poorly formatted text for applications like text-to-speech (TTS) and general text preprocessing. This model is designed to be conservative, focusing on preserving the original meaning and structure of the input without rewriting sentences or altering domain-specific tokens such as URLs or names. Version 1.2 specifically enhances casing accuracy compared to its predecessor.

Key Capabilities

Lightweight and Efficient: A 0.8B parameter model suitable for integration into various pipelines.
Conservative Normalization: Prioritizes meaning preservation, avoiding aggressive corrections or sentence rewrites.
Improved Casing: Version 1.2 offers better capitalization for specific names and phrases.
Deterministic Output: Provides consistent results without sampling.
Handles Mixed Text: Capable of processing natural language alongside technical content.
Multi-Sentence Support: Can normalize multiple sentences when boundaries are clear.

Training and Performance

The model was fine-tuned from Qwen3-0.6B-Base on a mixed dataset including formal, conversational, and synthetic text, with an additional 10k rows for casing improvement. Training involved 3 epochs plus 1 for casing, achieving a mean token accuracy of 97.53% (99.77% for casing CFT). While token accuracy is high, real-world human-level correctness is estimated at 90–95%.

Limitations

CeluneNorm is not a full grammar correction system. It may occasionally miss punctuation or casing corrections, be conservative with contractions, and preserve ambiguous casing. It is most reliable for sequences under 128 tokens and prioritizes safety and meaning preservation over extensive linguistic correction.

Overview

Overview

Key Capabilities

Training and Performance

Limitations

Full Model Card (README)