Name: shisa-ai/shisa-v1-llama3-8b API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: shisa-ai

shisa-v1-llama3-8b: A Llama 3-based Japanese-Optimized LLM

shisa-v1-llama3-8b is an 8 billion parameter instruction-tuned model built upon Meta's Llama 3-8B-Instruct architecture. Developed by shisa-ai, this model has undergone fine-tuning to enhance its performance, particularly in Japanese language understanding and generation.

Key Capabilities & Performance

This model demonstrates competitive performance on several Japanese benchmarks, with the shisa-v1-llama3-8b (8-e6) variant achieving an average score of 6.59. Specific benchmark results include:

ELYZA-tasks-100: 6.67
JA MT-Bench: 6.95
Rakuda: 7.05
Tengu-Bench: 5.68

These scores position it favorably against other 7B-14B parameter models in Japanese contexts, such as lightblue/suzume-llama-3-8B-japanese and augmxnt/shisa-gamma-7b-v1.

Training Details

The model was fine-tuned using the augmxnt/ultra-orca-boros-en-ja-v1 dataset, leveraging the Axolotl framework. Training involved a learning rate of 8e-06 over 3 epochs, with a sequence length of 8192 tokens. The training process utilized 8 GPUs with a total batch size of 64.

Intended Use Cases

Given its strong performance on Japanese benchmarks, shisa-v1-llama3-8b is well-suited for applications requiring robust Japanese language processing, including but not limited to:

General-purpose conversational AI in Japanese
Text generation and summarization for Japanese content
Japanese language understanding tasks

Compute resources for training were provided by Ubitus.

Overview

shisa-v1-llama3-8b: A Llama 3-based Japanese-Optimized LLM

Key Capabilities & Performance

Training Details

Intended Use Cases

Full Model Card (README)