Overview
Overview of rinna/llama-3-youko-70b-instruct
rinna/llama-3-youko-70b-instruct is a 70 billion parameter instruction-tuned model developed by rinna, building upon the rinna/llama-3-youko-70b base model. It leverages the Llama 3 architecture, featuring an 80-layer transformer with an 8192-hidden-size, and utilizes the Llama-3 chat format.
Key Capabilities and Training
This model was developed through a two-stage process:
- Supervised Fine-Tuning (SFT): Initial instruction-tuning was performed using a proprietary rinna dataset.
- Model Merging with Chat Vector: The SFT model was further enhanced by adding a "Chat Vector." This vector was derived by subtracting the parameter vectors of
meta-llama/Meta-Llama-3-70Bfrommeta-llama/Meta-Llama-3-70B-Instruct, and then adding 0.5 times this difference to thellama-3-youko-70b-sftmodel. This technique, based on the Chat Vector approach, aims to improve instruction-following and model alignment.
Usage Considerations
Users should note that this instruction-tuned model may exhibit a tendency to generate repeated text. To mitigate this, it is recommended to apply a repetition_penalty=1.1 during generation for optimal performance, as was done during its evaluation experiments. The model uses the original meta-llama/Meta-Llama-3-70B-Instruct tokenizer.