Name: lunahr/CeluneNorm-0.6B-v2.0-ctx2048 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: lunahr

Overview

CeluneNorm-0.6B-v2.0-ctx2048 is a 0.8 billion parameter causal language model developed by lunahr, based on Qwen3-0.6B-Base. It is specifically designed for text normalization, aiming to transform poorly formatted input into clean, readable text without altering its original meaning or intent. This version (2.0) significantly improves performance on longer contexts, supporting inputs up to 2048 tokens, making it suitable for normalizing more extensive text segments compared to its predecessor.

Key Capabilities

Lightweight Text Normalization: Converts informal or poorly formatted text into a standardized, readable format.
Meaning Preservation: Conservatively avoids rewriting sentences or changing the original meaning.
Domain-Specific Token Handling: Preserves URLs, commands, names, and other domain-specific tokens.
Long Context Support: Handles normalization for inputs up to 2048 tokens, an improvement over previous versions.
Deterministic Output: Provides consistent normalization without requiring sampling.
Mixed Text Handling: Capable of processing text containing both natural language and technical content.

Training and Performance

The model was fine-tuned on a mixed dataset including formal text, conversational text, synthetic edge cases, and quoted text. It also received additional tuning for casing accuracy and long-context coherence. Training metrics show a mean token accuracy of 97.53%, with real-world human-level correctness estimated at 90-95%.

Limitations

CeluneNorm is not a full grammar correction system. It may exhibit limitations such as missing some punctuation or casing corrections, being conservative with contractions, and potentially preserving ambiguous casing. It prioritizes safety and meaning preservation over aggressive correction.

Overview

Overview

Key Capabilities

Training and Performance

Limitations

Full Model Card (README)