Name: tokyotech-llm/Swallow-70b-instruct-hf API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: tokyotech-llm

Swallow-70b-instruct-hf: Japanese-Enhanced Llama 2 Model

Swallow-70b-instruct-hf is a 70 billion parameter instruction-tuned language model developed by TokyoTech-LLM. It is based on the Llama 2 architecture and has been continually pre-trained with extensive Japanese language data, significantly enhancing its proficiency in Japanese. A key feature is its tokenizer, which incorporates a broadened vocabulary specifically for Japanese, leading to more efficient text representation and faster inference.

Key Capabilities and Performance

Superior Japanese Language Performance: The model demonstrates strong performance across a range of Japanese benchmarks, including JCommonsenseQA, JEMHopQA, NIILC, JSQuAD, XL-Sum, WMT20, and MGSM. For instance, the 70B Swallow model achieves 0.9348 on JCommonsenseQA and 0.4840 on MGSM, outperforming its Llama 2 counterpart in most Japanese tasks.
Efficient Tokenization: Utilizes a tokenizer with an expanded Japanese vocabulary, allowing for more compact text representation and improved inference speed.
Instruction-Tuned: This specific version is instruction-tuned, making it suitable for conversational and instruction-following tasks.
Competitive English Performance: While optimized for Japanese, the model maintains solid performance on English benchmarks such as OpenBookQA, TriviaQA, HellaSwag, SQuAD2.0, XWINO, and GSM8K.

Training Details

The model underwent continual pre-training using diverse datasets including Japanese Wikipedia, RefinedWeb, Swallow Corpus, and The Pile. Instruction tuning was performed with datasets like Anthropic HH-RLHF, Databricks Dolly 15-k, and OpenAssistant Conversations Dataset, all adapted for Japanese.

Good for

Applications requiring high-quality Japanese language understanding and generation.
Instruction-following tasks in Japanese.
Scenarios where efficient processing of Japanese text is crucial due to optimized tokenization.
Research and development in multilingual LLMs, particularly for Japanese-English contexts.

Overview

Swallow-70b-instruct-hf: Japanese-Enhanced Llama 2 Model

Key Capabilities and Performance

Training Details

Good for

Full Model Card (README)