tokyotech-llm/Swallow-70b-instruct-hf
The Swallow-70b-instruct-hf model, developed by TokyoTech-LLM, is a 70 billion parameter instruction-tuned causal language model built upon the Llama 2 architecture. It has undergone continual pre-training with a significant addition of Japanese language data, featuring a broadened vocabulary for efficient Japanese text representation and faster inference. This model excels in Japanese language tasks, demonstrating strong performance across various benchmarks including question answering, summarization, and mathematical reasoning, while maintaining competitive English capabilities.
Loading preview...
Swallow-70b-instruct-hf: Japanese-Enhanced Llama 2 Model
Swallow-70b-instruct-hf is a 70 billion parameter instruction-tuned language model developed by TokyoTech-LLM. It is based on the Llama 2 architecture and has been continually pre-trained with extensive Japanese language data, significantly enhancing its proficiency in Japanese. A key feature is its tokenizer, which incorporates a broadened vocabulary specifically for Japanese, leading to more efficient text representation and faster inference.
Key Capabilities and Performance
- Superior Japanese Language Performance: The model demonstrates strong performance across a range of Japanese benchmarks, including JCommonsenseQA, JEMHopQA, NIILC, JSQuAD, XL-Sum, WMT20, and MGSM. For instance, the 70B Swallow model achieves 0.9348 on JCommonsenseQA and 0.4840 on MGSM, outperforming its Llama 2 counterpart in most Japanese tasks.
- Efficient Tokenization: Utilizes a tokenizer with an expanded Japanese vocabulary, allowing for more compact text representation and improved inference speed.
- Instruction-Tuned: This specific version is instruction-tuned, making it suitable for conversational and instruction-following tasks.
- Competitive English Performance: While optimized for Japanese, the model maintains solid performance on English benchmarks such as OpenBookQA, TriviaQA, HellaSwag, SQuAD2.0, XWINO, and GSM8K.
Training Details
The model underwent continual pre-training using diverse datasets including Japanese Wikipedia, RefinedWeb, Swallow Corpus, and The Pile. Instruction tuning was performed with datasets like Anthropic HH-RLHF, Databricks Dolly 15-k, and OpenAssistant Conversations Dataset, all adapted for Japanese.
Good for
- Applications requiring high-quality Japanese language understanding and generation.
- Instruction-following tasks in Japanese.
- Scenarios where efficient processing of Japanese text is crucial due to optimized tokenization.
- Research and development in multilingual LLMs, particularly for Japanese-English contexts.