shisa-ai/shisa-v2-mistral-small-24b

TEXT GENERATIONConcurrency Cost:2Model Size:24BQuant:FP8Ctx Length:32kPublished:Apr 27, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Shisa V2 is a family of bilingual Japanese and English (JA/EN) general-purpose chat models developed by Shisa.AI, with this specific model being shisa-v2-mistral-small-24b. These models are optimized for Japanese language tasks while maintaining strong English capabilities, focusing on post-training optimization rather than tokenizer extension or continued pre-training. The Shisa V2 models leverage an expanded and refined synthetic-data driven approach, achieving substantial performance gains in Japanese language processing. They are suitable for applications requiring robust bilingual chat functionalities, particularly excelling in Japanese contexts.

Loading preview...

Shisa V2: Bilingual Japanese/English Chat Models

Shisa V2 is a series of general-purpose chat models developed by Shisa.AI, designed to excel in Japanese language tasks while retaining robust English capabilities. Unlike previous iterations, Shisa V2 focuses on optimizing post-training through an expanded and refined synthetic-data driven approach, rather than tokenizer extension or costly continued pre-training.

Key Capabilities & Features

  • Bilingual Proficiency: Strong performance in both Japanese and English, with a particular emphasis on Japanese output quality.
  • Optimized Post-Training: Achieves significant performance gains through advanced synthetic data and fine-tuning techniques.
  • Robust Model Family: Part of a diverse family ranging from 7B to 70B parameters, all trained with consistent datasets and recipes.
  • Extensive Evaluation: Benchmarked using a custom "multieval" harness, including standard and newly developed Japanese-specific evaluations like shisa-jp-ifeval, shisa-jp-rp-bench, and shisa-jp-tl-bench.
  • Flexible Usage: Inherits chat templates from base models and is validated for inference with vLLM and SGLang, with recommendations for temperature and top_p/min_p for different tasks.

Training & Datasets

Shisa V2 models were trained using a comprehensive supervised fine-tuning (SFT) dataset of approximately 360K samples, including a filtered and regenerated version of shisa-ai/shisa-v2-sharegpt, translated prompts, and custom role-playing and instruction-following data. The DPO mix, though smaller, includes English-only preference data that surprisingly outperformed larger JA/EN sets, alongside specific DPO sets for role-playing, translation, instruction-following, and politeness control.