Name: juanquivilla/sotto-cleanup-lfm25-350m API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: juanquivilla

Overview

This model, juanquivilla/sotto-cleanup-lfm25-350m, is a 350 million parameter, full-precision bf16 fine-tune of the LiquidAI/LFM2.5-350M-Base architecture. Its primary purpose is on-device speech-to-text transcript cleanup, specifically targeting the refinement of ASR outputs.

Key Differentiators

Model Souping: This model is a unique "weight-space average" of two high-performing checkpoints (v55 and v51) from the same fine-tuning lineage. This technique combines the strengths of both, recovering v51's adversarial sampling performance while retaining v55's gains in number accuracy and filler-stripping.
Optimized for Accuracy: Achieves a 96.5% number accuracy and an 86.4% adversarial benchmark (greedy) in production-mode evaluations. It also demonstrates strong performance in reducing sub-deletion and minimizing sampling loops.
Specialized Training: The model underwent a sophisticated training pipeline, including GRPO (Generative Reinforcement Policy Optimization) with substantive-deletion-aware rewards, augmented number examples, and anti-loop n-gram penalties.

Recommended Usage

For optimal performance, especially on Apple Silicon, users are recommended to use the MLX 5-bit variant. Inference should use specific settings:

repetition_penalty=1.05 to prevent rare 5-gram loops.
max_new_tokens >= 1.5 × input_word_count (or 900 minimum) to avoid content truncation.
do_sample=False for deterministic greedy output, or temperature=0.1, top_k=50 for sampling.

Overview

Overview

Key Differentiators

Recommended Usage

Full Model Card (README)