Name: tokyotech-llm/Swallow-7b-NVE-instruct-hf API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: tokyotech-llm

Swallow-7b-NVE-instruct-hf: Japanese-Enhanced Llama 2 Instruction Model

The tokyotech-llm/Swallow-7b-NVE-instruct-hf is a 7 billion parameter instruction-tuned model developed by TokyoTech-LLM. It is built upon the Llama 2 architecture and has undergone continual pre-training with significant Japanese language data, making it highly proficient in Japanese. This particular model is an "NVE" (No Vocabulary Expansion) variant, meaning it utilizes a standard tokenizer rather than one with an expanded Japanese vocabulary, which can impact tokenization efficiency compared to other Swallow models.

Key Capabilities and Features

Japanese Language Proficiency: Excels in various Japanese tasks, outperforming the base Llama 2 7B model across benchmarks like JCommonsenseQA, JEMHopQA, NIILC, and JSQuAD.
Instruction Following: Fine-tuned using supervised fine-tuning (SFT) on datasets including Japanese translations of Anthropic HH-RLHF, Databricks Dolly 15k, and OpenAssistant Conversations Dataset.
Bilingual Support: While optimized for Japanese, it also handles English, though its performance on English benchmarks is generally lower than the original Llama 2.
Llama 2 Foundation: Inherits the robust architecture of the Llama 2 family.

Training and Datasets

The model was continually pre-trained on a diverse corpus including Japanese Wikipedia, RefinedWeb, The Pile, and the proprietary Swallow Corpus. Instruction tuning leveraged Japanese versions of popular instruction datasets.

Usage Considerations

This model is suitable for applications requiring strong instruction-following capabilities in Japanese. Developers should note its NVE characteristic when comparing with other Swallow models that feature vocabulary expansion for potentially faster inference. The model is still in early research stages and has not been extensively tuned for safety or human alignment.

Overview

Swallow-7b-NVE-instruct-hf: Japanese-Enhanced Llama 2 Instruction Model

Key Capabilities and Features

Training and Datasets

Usage Considerations

Full Model Card (README)