bingbangboom/Qwen352B-transcriber-new

VISIONConcurrency Cost:1Model Size:2.3BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Mar 8, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The bingbangboom/Qwen352B-transcriber-new is a 2.3 billion parameter Qwen3.5-based model developed by bingbangboom, fine-tuned from unsloth/Qwen3.5-2B. This model functions as a post-processor for local Automatic Speech Recognition (ASR) outputs, designed to transform raw transcripts into clean, polished, and coherent written text. It excels at noise reduction, grammar correction, punctuation injection, and contextual repair, making it ideal for dictation app integration.

Loading preview...

Model Overview

The bingbangboom/Qwen352B-transcriber-new is a 2.3 billion parameter Qwen3.5-based model, developed by bingbangboom and fine-tuned from unsloth/Qwen3.5-2B. Its primary function is to act as a post-processor for local Automatic Speech Recognition (ASR) systems, transforming raw, dictated transcripts into polished, coherent written text. The model was trained 2x faster using Unsloth and Huggingface's TRL library.

Key Capabilities

  • Noise Reduction: Removes filler words, false starts, stutters, and accidental repetitions.
  • Self-Correction Handling: Outputs only the intended, corrected version when a speaker self-corrects.
  • Grammar and Punctuation: Fixes grammar, spelling, and punctuation errors, proactively injecting necessary punctuation.
  • Contextual Repair: Uses surrounding context to reconstruct logical meaning from grammatically correct but nonsensical phrases.
  • Voice and Tone Preservation: Maintains the speaker's natural voice, tone, intent, and formality, preserving technical terms and proper nouns.
  • Data Formatting: Converts spoken numbers, dates, times, and currency into standard written formats, and standardizes titles.
  • Smart Structural Formatting: Applies formatting like bullet points, numbered lists, and paragraph breaks to improve readability.

Recommended Usage

This model is specifically designed for integration into speech-to-text dictation applications. It operates under a strict system prompt to ensure only the corrected transcript is outputted, without introductions or meta-commentary. Recommended settings include a Temperature of 0 for deterministic output, top_k = 40, top_p = 0.95, min_p = 0.05, and a repeat_penalty = 1.1.