Name: tokyotech-llm/Swallow-13b-NVE-hf API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: tokyotech-llm

Overview

Swallow-13b-NVE-hf is a 13 billion parameter language model developed by TokyoTech-LLM, built upon the Llama 2 architecture. It has undergone continual pre-training with a significant addition of Japanese language data, aiming to enhance its performance in Japanese-centric tasks. This specific variant, "NVE" (No Vocabulary Expansion), indicates a version that does not utilize an expanded vocabulary based on Japanese data, differentiating it from other Swallow models that do.

Key Capabilities

Strong Japanese Language Performance: Benchmarks show significant improvements over Llama 2 13B on various Japanese tasks, including JCommonsenseQA, JEMHopQA, NIILC, JSQuAD, and MGSM.
Retained English Capabilities: While optimized for Japanese, the model generally maintains competitive performance on English benchmarks compared to its Llama 2 base, though some scores may be slightly lower.
Llama 2 Foundation: Benefits from the robust architecture and pre-training of the Llama 2 family.

Should you use this for your use case?

Japanese Language Applications: Ideal for tasks requiring high proficiency in Japanese text understanding, generation, and reasoning.
Research and Development: Suitable for researchers exploring cross-lingual adaptation and the impact of continual pre-training on specific languages.
Base Model for Fine-tuning: Can serve as a strong base for further fine-tuning on specific Japanese-language downstream tasks, especially if vocabulary expansion is not desired or causes compatibility issues with existing pipelines.

Overview

Overview

Key Capabilities

Should you use this for your use case?

Full Model Card (README)