shisa-ai/shisa-v2.1-qwen3-8b

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Nov 20, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

shisa-ai/shisa-v2.1-qwen3-8b is an 8 billion parameter, Qwen3-based bilingual Japanese and English chat model developed by Shisa.AI, featuring a 32K token context length. It is part of the Shisa V2.1 family, which focuses on enhancing Japanese language performance while maintaining strong English capabilities. This model demonstrates significant improvements in Japanese language benchmarks and reduced cross-lingual token leakage compared to previous versions, making it suitable for general-purpose chat applications requiring robust bilingual support.

Loading preview...

Shisa V2.1 Qwen3-8B Overview

shisa-ai/shisa-v2.1-qwen3-8b is an 8 billion parameter, Qwen3-based model from Shisa.AI, designed for bilingual Japanese and English chat. It is part of the Shisa V2.1 series, which incorporates an updated dataset and refined training recipes, including SFT, DPO, and in some cases, model-merging and RL stages. This model offers a 32K token context length and is optimized for local and edge-based use cases.

Key Capabilities & Performance

  • Enhanced Japanese Language Performance: Achieves a JA AVG score of 67.8 and an EN AVG score of 57.8 on Shisa.AI's internal multieval test battery, showing substantial gains over its Shisa V2 predecessors. For instance, it scores 7.783 on JA MT-Bench (GPT-4-Turbo judge).
  • Reduced Cross-Lingual Token Leakage (CLTL): Demonstrates a 5.0x improvement in mitigating CLTL compared to the base Qwen3-8B model, with leakage reduced to 0.44%. This is crucial for high-quality Japanese language generation, preventing the output of non-Japanese words or sub-words.
  • Robust Instruction Following: Benefits from new dataset additions and re-tuned training for better instruction following, translation, and politeness, specifically tailored for Japanese language nuances.

Ideal Use Cases

  • Bilingual Chatbots: Excellent for applications requiring high-quality conversational abilities in both Japanese and English.
  • Japanese Language Processing: Suited for tasks demanding precise Japanese output, such as translation, customer service, or content generation, where minimizing language confusion is critical.
  • Edge and Local Deployments: Its 8B parameter size makes it viable for deployment in environments with limited computational resources.