SakanaAI/TinySwallow-1.5B-Instruct

Warm
Public
1.5B
BF16
131072
License: apache-2.0
Hugging Face
Overview

TinySwallow-1.5B-Instruct Overview

TinySwallow-1.5B-Instruct is a 1.5 billion parameter instruction-tuned language model developed by Sakana AI, primarily focused on Japanese language capabilities. It is an instruction-tuned version of TinySwallow-1.5B, created using a novel knowledge distillation method called TAID (Temporally Adaptive Interpolated Distillation). This process involved distilling knowledge from the larger Qwen2.5-32B-Instruct teacher model into a Qwen2.5-1.5B-Instruct student model.

Key Capabilities

  • Japanese Language Proficiency: Specifically instruction-tuned to enhance its ability to follow instructions and engage in conversations in Japanese.
  • Efficient Knowledge Transfer: Utilizes TAID for effective distillation, allowing a smaller model to achieve strong performance.
  • Large Context Window: Features a context length of 131072 tokens, enabling processing of extensive Japanese text.

Use Cases

This model is provided for research and development purposes only and is considered an experimental prototype. It is suitable for:

  • Exploring efficient knowledge distillation techniques.
  • Developing and testing Japanese-centric conversational AI applications.
  • Academic research into large language models and their optimization.

Users should be aware that the model is not intended for commercial use or deployment in mission-critical environments, and its performance is not guaranteed. The model's license is derived from Qwen (Apache 2.0) and trained on Gemma data (Gemma Terms), requiring compliance with both.