Name: tokyotech-llm/Llama-3.1-Swallow-8B-v0.2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: tokyotech-llm

Model Overview

Llama-3.1-Swallow-8B-v0.2 is an 8 billion parameter large language model developed by tokyotech-llm, built upon the Meta Llama 3.1 architecture. This model underwent extensive continual pre-training using approximately 200 billion tokens, primarily focusing on enhancing Japanese language proficiency while preserving its English capabilities. The training data includes a large Japanese web corpus (Swallow Corpus Version 2), Japanese and English Wikipedia articles, and specialized mathematical and coding content.

Key Capabilities & Enhancements

Bilingual Proficiency: Significantly improved Japanese language performance compared to the base Llama 3.1, while maintaining strong English language abilities.
Extensive Context Window: Supports a context length of 32768 tokens, allowing for processing longer inputs and generating more coherent, extended responses.
Robust Training Data: Continual pre-training leveraged high-quality, filtered datasets, including a refined Swallow Corpus Version 2 for Japanese and a quality-filtered version of The-Stack-v2 for coding content.
Performance Benchmarks: Demonstrates competitive performance across various Japanese benchmarks, including JCom., JEMHopQA, NIILC, and WMT20-en-ja, often outperforming other 8B models in its class. It also maintains strong results on English tasks like OpenBookQA and MMLU.

Ideal Use Cases

Japanese Language Applications: Excellent for tasks requiring high-quality Japanese text generation, comprehension, and translation.
Bilingual AI Systems: Suitable for applications that need to operate effectively in both Japanese and English environments.
Research and Development: A strong foundation model for further fine-tuning on specific Japanese or bilingual tasks due to its enhanced linguistic capabilities and Llama 3.1 base.