tokyotech-llm/Swallow-70b-hf

TEXT GENERATIONConcurrency Cost:4Model Size:70BQuant:FP8Ctx Length:8kPublished:Nov 25, 2023License:llama2Architecture:Transformer0.0K Open Weights Cold

Swallow-70b-hf is a 70 billion parameter causal language model developed by TokyoTech-LLM, continually pre-trained from the Llama 2 family with a significant addition of Japanese language data. This model utilizes a tokenizer with a broadened vocabulary for Japanese, enabling more efficient text representation and faster inference. It excels in Japanese language tasks, demonstrating strong performance across various benchmarks including question answering, summarization, and mathematical reasoning, while maintaining competitive English capabilities.

Loading preview...

Swallow-70b-hf: A Llama 2-based Model Optimized for Japanese

Swallow-70b-hf is a 70 billion parameter language model developed by TokyoTech-LLM, built upon the Llama 2 architecture. Its core distinction lies in its continual pre-training with extensive Japanese language data, significantly enhancing its performance in Japanese tasks compared to the original Llama 2 models. The model employs a broadened vocabulary tokenizer specifically designed for Japanese, which leads to more efficient text representation and faster inference speeds.

Key Capabilities

  • Enhanced Japanese Language Proficiency: Demonstrates superior performance across a range of Japanese benchmarks, including JCommonsenseQA, JEMHopQA, NIILC, JSQuAD, XL-Sum, MGSM, and WMT20 machine translation tasks.
  • Efficient Tokenization: Utilizes a Japanese-optimized tokenizer for faster inference.
  • Strong Foundation: Benefits from the robust architecture of the Llama 2 family.

When to Use This Model

  • Japanese NLP Applications: Ideal for tasks requiring high accuracy and efficiency in Japanese language understanding and generation.
  • Cross-Lingual Research: Suitable for projects involving both Japanese and English, where strong Japanese performance is critical.
  • Resource-Efficient Inference: The optimized tokenizer can provide benefits in scenarios where inference speed is a concern.