Name: lunahr/CeluneNorm-0.6B-v1.1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: lunahr

CeluneNorm-0.6B-v1.1: Lightweight Text Normalization

CeluneNorm-0.6B-v1.1, developed by lunahr, is a 0.8 billion parameter causal language model fine-tuned from Qwen3-0.6B-Base. Its primary function is text normalization, transforming poorly formatted input into clean, readable text suitable for Text-to-Speech (TTS) and other preprocessing tasks. The model is designed to be conservative, prioritizing the preservation of original meaning and avoiding sentence rewriting or changes to domain-specific tokens like URLs or names.

Key Capabilities

Deterministic output: Ensures consistent normalization without sampling.
Meaning preservation: Avoids altering the original intent or content of the text.
Structure and intent handling: Maintains the structural integrity of the input.
Mixed text support: Effectively processes natural language combined with technical content.
Conservative punctuation: Prefers standard punctuation like periods over exclamation marks unless explicitly indicated.
Multi-sentence normalization: Can normalize multiple sentences when boundaries are clear.

Training and Performance

The model was fine-tuned on a mixed dataset including formal, conversational, and synthetic text, achieving a mean token accuracy of 97.53%. While not a full grammar correction system, it provides reliable text cleaning, particularly for applications where meaning preservation and structural integrity are paramount. It is not intended for aggressive correction or expanding slang.