tokyotech-llm/Llama-3.1-Swallow-8B-v0.2
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Oct 28, 2024License:llama3.1Architecture:Transformer0.0K Warm

Llama-3.1-Swallow-8B-v0.2 by tokyotech-llm is an 8 billion parameter language model built by continually pre-training on Meta Llama 3.1. This model significantly enhances Japanese language capabilities while maintaining strong English performance, utilizing approximately 200 billion tokens from a large Japanese web corpus, Wikipedia, and mathematical/coding content. It features a 32768 token context length and demonstrates improved Japanese benchmark scores compared to its Llama 3.1 base.

Loading preview...

Model Overview

Llama-3.1-Swallow-8B-v0.2 is an 8 billion parameter large language model developed by tokyotech-llm, built upon the Meta Llama 3.1 architecture. This model underwent extensive continual pre-training using approximately 200 billion tokens, primarily focusing on enhancing Japanese language proficiency while preserving its English capabilities. The training data includes a large Japanese web corpus (Swallow Corpus Version 2), Japanese and English Wikipedia articles, and specialized mathematical and coding content.

Key Capabilities & Enhancements

  • Bilingual Proficiency: Significantly improved Japanese language performance compared to the base Llama 3.1, while maintaining strong English language abilities.
  • Extensive Context Window: Supports a context length of 32768 tokens, allowing for processing longer inputs and generating more coherent, extended responses.
  • Robust Training Data: Continual pre-training leveraged high-quality, filtered datasets, including a refined Swallow Corpus Version 2 for Japanese and a quality-filtered version of The-Stack-v2 for coding content.
  • Performance Benchmarks: Demonstrates competitive performance across various Japanese benchmarks, including JCom., JEMHopQA, NIILC, and WMT20-en-ja, often outperforming other 8B models in its class. It also maintains strong results on English tasks like OpenBookQA and MMLU.

Ideal Use Cases

  • Japanese Language Applications: Excellent for tasks requiring high-quality Japanese text generation, comprehension, and translation.
  • Bilingual AI Systems: Suitable for applications that need to operate effectively in both Japanese and English environments.
  • Research and Development: A strong foundation model for further fine-tuning on specific Japanese or bilingual tasks due to its enhanced linguistic capabilities and Llama 3.1 base.