shisa-ai/shisa-v2-mistral-small-24b
Shisa V2 is a family of bilingual Japanese and English (JA/EN) general-purpose chat models developed by Shisa.AI, with this specific model being shisa-v2-mistral-small-24b. These models are optimized for Japanese language tasks while maintaining strong English capabilities, focusing on post-training optimization rather than tokenizer extension or continued pre-training. The Shisa V2 models leverage an expanded and refined synthetic-data driven approach, achieving substantial performance gains in Japanese language processing. They are suitable for applications requiring robust bilingual chat functionalities, particularly excelling in Japanese contexts.
Loading preview...
Shisa V2: Bilingual Japanese/English Chat Models
Shisa V2 is a series of general-purpose chat models developed by Shisa.AI, designed to excel in Japanese language tasks while retaining robust English capabilities. Unlike previous iterations, Shisa V2 focuses on optimizing post-training through an expanded and refined synthetic-data driven approach, rather than tokenizer extension or costly continued pre-training.
Key Capabilities & Features
- Bilingual Proficiency: Strong performance in both Japanese and English, with a particular emphasis on Japanese output quality.
- Optimized Post-Training: Achieves significant performance gains through advanced synthetic data and fine-tuning techniques.
- Robust Model Family: Part of a diverse family ranging from 7B to 70B parameters, all trained with consistent datasets and recipes.
- Extensive Evaluation: Benchmarked using a custom "multieval" harness, including standard and newly developed Japanese-specific evaluations like
shisa-jp-ifeval,shisa-jp-rp-bench, andshisa-jp-tl-bench. - Flexible Usage: Inherits chat templates from base models and is validated for inference with vLLM and SGLang, with recommendations for temperature and
top_p/min_pfor different tasks.
Training & Datasets
Shisa V2 models were trained using a comprehensive supervised fine-tuning (SFT) dataset of approximately 360K samples, including a filtered and regenerated version of shisa-ai/shisa-v2-sharegpt, translated prompts, and custom role-playing and instruction-following data. The DPO mix, though smaller, includes English-only preference data that surprisingly outperformed larger JA/EN sets, alongside specific DPO sets for role-playing, translation, instruction-following, and politeness control.