shisa-ai/shisa-v1-llama3-8b

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kLicense:llama3Architecture:Transformer0.0K Warm

shisa-ai/shisa-v1-llama3-8b is an 8 billion parameter Llama 3-based instruction-tuned causal language model developed by shisa-ai. Fine-tuned from Meta-Llama-3-8B-Instruct, this model demonstrates strong performance on Japanese language benchmarks, achieving an average score of 6.59 across ELYZA100, JA MT-Bench, Rakuda, and Tengu-Bench. It is optimized for general-purpose Japanese language tasks, offering a competitive option in its size class.

Loading preview...

shisa-v1-llama3-8b: A Llama 3-based Japanese-Optimized LLM

shisa-v1-llama3-8b is an 8 billion parameter instruction-tuned model built upon Meta's Llama 3-8B-Instruct architecture. Developed by shisa-ai, this model has undergone fine-tuning to enhance its performance, particularly in Japanese language understanding and generation.

Key Capabilities & Performance

This model demonstrates competitive performance on several Japanese benchmarks, with the shisa-v1-llama3-8b (8-e6) variant achieving an average score of 6.59. Specific benchmark results include:

  • ELYZA-tasks-100: 6.67
  • JA MT-Bench: 6.95
  • Rakuda: 7.05
  • Tengu-Bench: 5.68

These scores position it favorably against other 7B-14B parameter models in Japanese contexts, such as lightblue/suzume-llama-3-8B-japanese and augmxnt/shisa-gamma-7b-v1.

Training Details

The model was fine-tuned using the augmxnt/ultra-orca-boros-en-ja-v1 dataset, leveraging the Axolotl framework. Training involved a learning rate of 8e-06 over 3 epochs, with a sequence length of 8192 tokens. The training process utilized 8 GPUs with a total batch size of 64.

Intended Use Cases

Given its strong performance on Japanese benchmarks, shisa-v1-llama3-8b is well-suited for applications requiring robust Japanese language processing, including but not limited to:

  • General-purpose conversational AI in Japanese
  • Text generation and summarization for Japanese content
  • Japanese language understanding tasks

Compute resources for training were provided by Ubitus.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p