Name: tokyotech-llm/Swallow-70b-NVE-hf API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: tokyotech-llm

Swallow-70b-NVE-hf: Japanese-Enhanced Llama 2 Variant

Swallow-70b-NVE-hf is a 69 billion parameter large language model developed by tokyotech-llm, built upon the Llama 2 architecture. It has undergone extensive continual pre-training, primarily incorporating a substantial amount of Japanese language data, alongside datasets like RefinedWeb, Swallow Corpus, and The Pile. This model is part of the "No Vocabulary Expansion" (NVE) series, meaning it uses the original Llama 2 tokenizer without additional Japanese vocabulary, which can be a consideration for tokenization efficiency compared to models with expanded vocabularies.

Key Capabilities & Performance

The model demonstrates strong performance in Japanese language tasks, significantly outperforming the base Llama 2 70B model across various benchmarks such as JCommonsenseQA, JEMHopQA, NIILC, JSQuAD, XL-Sum, MGSM, and WMT20 machine translation. For instance, it achieves 0.9410 on JCommonsenseQA and 0.7024 on NIILC, surpassing Llama 2's scores. While optimized for Japanese, it maintains competitive, though slightly lower, performance on English benchmarks compared to the Llama 2 70B model.

Use Cases & Considerations

This model is particularly well-suited for applications requiring high proficiency in Japanese language understanding and generation. Its continual pre-training on Japanese data makes it a strong candidate for tasks like Japanese question answering, summarization, and translation. Developers should note its NVE characteristic, which implies a standard Llama 2 tokenizer, and consider this when evaluating tokenization efficiency for specific Japanese text processing workflows. The model is still in early research and development stages and has not been extensively tuned for safety or human alignment.

Overview

Swallow-70b-NVE-hf: Japanese-Enhanced Llama 2 Variant

Key Capabilities & Performance

Use Cases & Considerations

Full Model Card (README)