Overview
Llama-3.1-Swallow-8B-Instruct-v0.2 Overview
This model is an 8 billion parameter instruction-tuned variant from the Llama 3.1 Swallow series, developed by tokyotech-llm. It is built by continually pre-training the original Meta Llama 3.1 models, specifically enhancing Japanese language proficiency while preserving English capabilities. The pre-training involved approximately 200 billion tokens from a large Japanese web corpus (Swallow Corpus Version 2), Japanese and English Wikipedia, and mathematical/coding content.
Key Capabilities & Features
- Enhanced Japanese Performance: Achieves a Japanese average score of 0.5141 across various benchmarks, outperforming other Llama 3 and Qwen2 models in its size class.
- Strong English Performance: Maintains competitive English task performance with an average score of 0.5823.
- Instruction-Tuned: Fine-tuned using synthetic Japanese and English datasets, including
Llama-3.1-LMSYS-Chat-1M-Synth-Ja,Swallow-Magpie-Ultra-v0.1, andfiltered-magpie-ultra-en. - Multi-turn Dialogue: Demonstrates solid performance on the MT-Bench JA, scoring 0.5584.
Good For
- Applications requiring robust performance in both Japanese and English.
- Tasks involving question answering, summarization, and code generation in Japanese contexts.
- Multi-turn conversational AI systems targeting Japanese users.