Name: lunahr/CeluneNorm-0.6B-v1.3 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: lunahr

CeluneNorm-0.6B-v1.3: Lightweight Text Normalization

CeluneNorm-0.6B-v1.3, developed by lunahr, is a 0.8 billion parameter causal language model based on Qwen3-0.6B-Base. Its primary function is to normalize poorly formatted text into clean, readable output, making it ideal for Text-to-Speech (TTS) systems and text preprocessing. The model is designed to be conservative, preserving the original meaning, avoiding sentence rewriting, and maintaining domain-specific tokens like URLs or names.

Key Capabilities

Deterministic Output: Provides consistent normalization without sampling.
Meaning Preservation: Converts text while maintaining the original intent and structure.
Improved Punctuation: Version 1.3 significantly enhances punctuation and infers sentence boundaries compared to previous versions.
Mixed Text Handling: Capable of processing both natural language and technical content.
Conservative Correction: Prioritizes safety and meaning over aggressive grammar correction, avoiding changes to slang or informal language.
Efficient: Fine-tuned on a mix of formal, conversational, and synthetic data, including specific casing data, for robust performance.

Usage Considerations

This model expects input in the format YOUR INPUT<NORM> and works reliably on sequences below 128 tokens. It is not a full grammar correction system and may miss some nuanced corrections or preserve ambiguous casing. The model achieves a mean token accuracy of 97.53% (99.77% for casing) on its training metrics, translating to approximately 90-95% human-level correctness in real-world normalization tasks.

Overview

CeluneNorm-0.6B-v1.3: Lightweight Text Normalization

Key Capabilities

Usage Considerations

Full Model Card (README)