Shisa V2.1 Qwen3-8B Overview
shisa-ai/shisa-v2.1-qwen3-8b is an 8 billion parameter, Qwen3-based model from Shisa.AI, designed for bilingual Japanese and English chat. It is part of the Shisa V2.1 series, which incorporates an updated dataset and refined training recipes, including SFT, DPO, and in some cases, model-merging and RL stages. This model offers a 32K token context length and is optimized for local and edge-based use cases.
Key Capabilities & Performance
- Enhanced Japanese Language Performance: Achieves a JA AVG score of 67.8 and an EN AVG score of 57.8 on Shisa.AI's internal
multieval test battery, showing substantial gains over its Shisa V2 predecessors. For instance, it scores 7.783 on JA MT-Bench (GPT-4-Turbo judge). - Reduced Cross-Lingual Token Leakage (CLTL): Demonstrates a 5.0x improvement in mitigating CLTL compared to the base Qwen3-8B model, with leakage reduced to 0.44%. This is crucial for high-quality Japanese language generation, preventing the output of non-Japanese words or sub-words.
- Robust Instruction Following: Benefits from new dataset additions and re-tuned training for better instruction following, translation, and politeness, specifically tailored for Japanese language nuances.
Ideal Use Cases
- Bilingual Chatbots: Excellent for applications requiring high-quality conversational abilities in both Japanese and English.
- Japanese Language Processing: Suited for tasks demanding precise Japanese output, such as translation, customer service, or content generation, where minimizing language confusion is critical.
- Edge and Local Deployments: Its 8B parameter size makes it viable for deployment in environments with limited computational resources.