Name: dataslab/DSLM-LST-9B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: dataslab

Overview

dataslab/DSLM-LST-9B is a 9-billion parameter model based on Qwen3.5, developed by dataslab. Its core innovation is Language Selection Tuning (LST), a learning-based technique designed to prevent unintended Chinese character leakage in responses to non-Chinese prompts (e.g., Korean, English, Japanese). Unlike post-hoc decoding methods, LST modifies the model's internal language-selection behavior, ensuring its effect is robust and persists even after further full-parameter fine-tuning (SFT/RLHF).

Key Capabilities & Features

Chinese-Leak Suppression: Significantly reduces the occurrence of Chinese characters in non-Chinese outputs, improving readability and user trust for multilingual applications.
Preserved Selectivity: The model retains the ability to generate fluent Chinese when explicitly requested by the user, avoiding blanket suppression.
Reasoning Performance: Benchmarks on KMMLU, HumanEval, and GSM8K show reasoning and task performance remain on par with, or slightly exceed, the base Qwen3.5-9B model.
Persistence Through SFT: The Chinese-leak suppression effect is highly stable and largely unaffected by subsequent full-parameter SFT stages, as demonstrated by a Suppression Retention Rate (SRR) close to 1.0.
Bit-Identical Core: Most of the network, including the tokenizer, chat template, and vision tower, is preserved bit-identical to the base model, ensuring compatibility with existing integrations and retaining multimodal capabilities.

Use Cases

Multilingual Applications: Ideal for applications serving non-Chinese users where unintended Chinese output is undesirable, particularly for Korean, English, and Japanese language tasks.
Downstream Fine-tuning: Suitable as a base for further fine-tuning, as its core language selection improvements are designed to persist.
Complex Reasoning: Supports a "Thinking mode" for complex reasoning tasks, with suppressed Chinese leakage even within internal thought processes.

Limitations

Not an Instruction-Tuned Chat Model: Inherits conversational behavior and instruction-following style from the base model; LST primarily addresses language leakage.
Degraded Chinese Generation: Quality for tasks explicitly requiring Chinese output (e.g., translation, Chinese code comments) will be lower than the base Qwen3.5-9B.
Multimodal Benchmarking: While vision capabilities are preserved, they have not been re-benchmarked in this release.

Overview

Overview

Key Capabilities & Features

Use Cases

Limitations

Full Model Card (README)