tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.3
The Llama-3.1-Swallow-8B-Instruct-v0.3 model by tokyotech-llm is an 8 billion parameter instruction-tuned large language model, continually pre-trained from Meta's Llama 3.1. It significantly enhances Japanese language capabilities while retaining strong English performance, utilizing approximately 200 billion tokens from Japanese web corpora, Wikipedia, and technical content. This model excels in multi-turn Japanese dialogue, achieving state-of-the-art performance on Japanese MT-Bench among open-source LLMs of comparable size.
Loading preview...
Llama-3.1-Swallow-8B-Instruct-v0.3 Overview
This model is an 8 billion parameter instruction-tuned variant from the Llama 3.1 Swallow series, developed by tokyotech-llm. It is built upon Meta's Llama 3.1 base models through continual pre-training with a focus on enhancing Japanese language capabilities while maintaining strong English performance. The pre-training involved approximately 200 billion tokens from diverse sources, including a large Japanese web corpus (Swallow Corpus Version 2), Japanese and English Wikipedia, and mathematical/coding content.
Key Capabilities & Features
- Bilingual Proficiency: Significantly improved Japanese language understanding and generation, alongside robust English capabilities.
- Instruction-Tuned: Optimized for following user instructions and engaging in multi-turn conversations, achieved through supervised fine-tuning on specially built synthetic Japanese data.
- State-of-the-Art Japanese MT-Bench Performance: This v0.3 release demonstrates leading performance on Japanese MT-Bench among open-source LLMs with 8 billion parameters or less, showing an 8.4-point improvement over its predecessor.
- Llama 3.1 Architecture: Leverages the architectural strengths of the Meta Llama 3.1 series.
Ideal Use Cases
- Japanese-centric Applications: Excellent for chatbots, content generation, and conversational AI systems requiring high proficiency in Japanese.
- Bilingual AI Assistants: Suitable for applications that need to seamlessly handle both Japanese and English interactions.
- Research and Development: A strong foundation for further fine-tuning or research into cross-lingual LLM adaptation, particularly for Japanese language tasks.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.