tokyotech-llm/Llama-3.1-Swallow-70B-Instruct-v0.3
Hugging Face
TEXT GENERATIONConcurrency Cost:4Model Size:70BQuant:FP8Ctx Length:32kPublished:Dec 25, 2024License:llama3.1Architecture:Transformer0.0K Warm

Llama-3.1-Swallow-70B-Instruct-v0.3 is a 70 billion parameter instruction-tuned large language model developed by tokyotech-llm, built upon Meta Llama 3.1. It enhances Japanese language capabilities through continual pre-training on approximately 200 billion Japanese and English tokens, while retaining strong English performance. This model is optimized for multi-turn dialogue, generating helpful and detailed responses, and excels in Japanese conversational tasks.

Loading preview...

Llama 3.1 Swallow 70B Instruct v0.3: Enhanced Japanese Conversational AI

Llama-3.1-Swallow-70B-Instruct-v0.3 is a 70 billion parameter instruction-tuned model from tokyotech-llm, based on Meta's Llama 3.1 architecture. This model significantly enhances Japanese language capabilities through extensive continual pre-training on a diverse corpus of approximately 200 billion Japanese and English tokens, including the Swallow Corpus Version 2, Wikipedia articles, and mathematical/coding content. It maintains the strong English language performance of its Llama 3.1 foundation.

Key Capabilities & Differentiators

  • Bilingual Proficiency: Optimized for both Japanese and English, with a particular focus on improving Japanese understanding and generation.
  • Instruction-Tuned for Dialogue: Fine-tuned using synthetic Japanese datasets to generate helpful and detailed responses in multi-turn conversations.
  • Improved Conversational Performance: Outperforms its predecessor, Llama-3.1-Swallow-70B-Instruct-v0.1, by 5.68 points on the Japanese MT-Bench benchmark, indicating enhanced dialogue capabilities.
  • Comprehensive Evaluation: Evaluated across a wide range of Japanese and English benchmarks, including MT-Bench JA, JCommonsenseQA, JHumanEval, MMLU, and HumanEval.

When to Use This Model

This model is particularly well-suited for applications requiring robust bilingual (Japanese and English) conversational AI. Its instruction-tuned nature makes it effective for generating detailed responses to user queries and engaging in multi-turn dialogues, especially in Japanese-centric use cases.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p