lunahr/CeluneNorm-0.6B-v2.0-ctx1024
CeluneNorm-0.6B-v2.0-ctx1024 by lunahr is a 0.6 billion parameter causal language model based on Qwen3-0.6B-Base, specifically designed for lightweight text normalization in TTS and preprocessing pipelines. It converts poorly formatted English input into clean, readable text while preserving meaning and domain-specific tokens. This version improves performance for inputs up to 1024 tokens, making it suitable for longer context normalization tasks.
Loading preview...
CeluneNorm-0.6B-v2.0-ctx1024: Text Normalization Model
CeluneNorm-0.6B-v2.0-ctx1024, developed by lunahr, is a 0.6 billion parameter causal language model fine-tuned from Qwen3-0.6B-Base. Its primary function is lightweight text normalization for English, converting poorly formatted input into clean, readable text without altering the original meaning or rewriting sentences. This model is particularly conservative, preserving domain-specific tokens like URLs, commands, and names.
Key Capabilities & Features
- Deterministic output, avoiding sampling for consistent results.
- Preserves structure and intent of the original text.
- Handles mixed text, including natural language and technical content.
- Conservative punctuation and casing, prioritizing meaning preservation.
- Long-context normalization supporting inputs up to 1024 tokens, an improvement over previous versions.
- Trained on a mixed dataset including formal, conversational, and synthetic edge cases, with specific tuning for casing and long-context coherence.
Use Cases & Limitations
This model is ideal for preprocessing text in Text-to-Speech (TTS) systems and general text pipelines where robust, conservative normalization is required. It is not a full grammar correction system and may miss some nuanced corrections or contractions. The model prioritizes safety and meaning preservation over aggressive correction, making it a reliable choice for maintaining text integrity.