rinna/llama-3-youko-70b-instruct

Warm
Public
70B
FP8
8192
License: llama3
Hugging Face
Overview

Overview of rinna/llama-3-youko-70b-instruct

rinna/llama-3-youko-70b-instruct is a 70 billion parameter instruction-tuned model developed by rinna, building upon the rinna/llama-3-youko-70b base model. It leverages the Llama 3 architecture, featuring an 80-layer transformer with an 8192-hidden-size, and utilizes the Llama-3 chat format.

Key Capabilities and Training

This model was developed through a two-stage process:

  • Supervised Fine-Tuning (SFT): Initial instruction-tuning was performed using a proprietary rinna dataset.
  • Model Merging with Chat Vector: The SFT model was further enhanced by adding a "Chat Vector." This vector was derived by subtracting the parameter vectors of meta-llama/Meta-Llama-3-70B from meta-llama/Meta-Llama-3-70B-Instruct, and then adding 0.5 times this difference to the llama-3-youko-70b-sft model. This technique, based on the Chat Vector approach, aims to improve instruction-following and model alignment.

Usage Considerations

Users should note that this instruction-tuned model may exhibit a tendency to generate repeated text. To mitigate this, it is recommended to apply a repetition_penalty=1.1 during generation for optimal performance, as was done during its evaluation experiments. The model uses the original meta-llama/Meta-Llama-3-70B-Instruct tokenizer.