dataslab/DSLM-LST-9B

VISIONConcurrency Cost:1Model Size:9BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:May 15, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

DSLM-LST-9B is a 9 billion parameter Qwen3.5-9B derivative developed by dataslab, specifically refined using Language Selection Tuning (LST). This technique is designed to suppress unintended Chinese character generation in non-Chinese outputs (e.g., English, Korean, Japanese) while preserving the base model's reasoning performance and multimodal capabilities. It excels at providing clean, language-specific responses, particularly for users in multilingual environments where language confusion is a common issue.

Loading preview...

Overview

DSLM-LST-9B is a 9 billion parameter model from dataslab, built upon the Qwen3.5-9B architecture. Its core innovation is Language Selection Tuning (LST), a learning-based method designed to prevent the unintended leakage of Chinese characters into non-Chinese language outputs. This addresses the common problem of "language confusion" in multilingual LLMs, especially those trained on Chinese-rich corpora, ensuring cleaner and more reliable responses for users of languages like Korean, English, or Japanese.

Key Capabilities & Differentiators

  • Effective Chinese-leak Suppression: Significantly reduces the occurrence of Chinese characters in non-Chinese outputs without blanket suppression, meaning it still generates fluent Chinese when explicitly requested.
  • Preserved Reasoning & Multimodal Performance: The LST adjustment is minimal, maintaining the base model's strong reasoning capabilities (KMMLU, HumanEval, GSM8K scores are comparable or slightly improved) and its vision/multimodal components.
  • Persistence Through Fine-tuning: The Chinese-leak suppression effect is robust and persists even after full-parameter Supervised Fine-Tuning (SFT), unlike post-hoc decoding tricks.
  • Bit-Identical Core: Most of the network, including the tokenizer and chat template, remains identical to the base Qwen3.5-9B, ensuring compatibility with existing integrations.

Ideal Use Cases

  • Multilingual Applications: Particularly beneficial for applications serving users in non-Chinese-speaking regions (e.g., Korea, Japan) where unintended Chinese output degrades user experience.
  • Downstream Fine-tuning: Suitable as a base for further SFT or RLHF, as its language selection behavior is designed to be stable through additional training stages.
  • Reasoning Tasks: Can be used for complex reasoning tasks, including those requiring a "thinking mode," with reduced risk of language confusion within internal thought processes.