upstage/Llama-2-70b-instruct
The upstage/Llama-2-70b-instruct is a 69 billion parameter instruction-tuned causal language model developed by Upstage, based on the LLaMA-2 architecture. This model is fine-tuned using an Orca-style dataset and features dynamic rope scaling, enabling it to handle input contexts of up to 32768 tokens. It demonstrates strong performance across various benchmarks, including ARC-Challenge, HellaSwag, MMLU, and TruthfulQA, making it suitable for general-purpose conversational AI and complex reasoning tasks.
Loading preview...
Overview
upstage/Llama-2-70b-instruct is a 69 billion parameter instruction-tuned language model developed by Upstage. It is built upon the LLaMA-2 backbone and has been fine-tuned using an Orca-style dataset. A key feature of this model is its enhanced context handling capability, supporting up to 32768 input tokens through dynamic rope scaling.
Key Capabilities & Performance
This model demonstrates competitive performance across several standard benchmarks, as evaluated on the Open LLM Leaderboard. It was tested on:
- ARC-Challenge
- HellaSwag
- MMLU
- TruthfulQA
- MT-bench (for multi-turn open-ended questions)
Compared to the base Llama-2-70b-instruct, this Upstage fine-tune shows improved scores, with an average H4 score of 72.3 and an MT-bench score of 7.24375. The model's ability to process longer inputs makes it versatile for various applications requiring extended context understanding.
Usage Considerations
- Extended Context: The model leverages
rope_scalingwith a dynamic factor of 2, allowing it to process inputs exceeding 10,000 tokens. - License: The fine-tuned checkpoints are licensed under the Non-Commercial Creative Commons license (CC BY-NC-4.0).
- Prompt Format: It uses a specific prompt template for instruction following:
### System: {System} ### User: {User} ### Assistant: {Assistant}
Good For
- Applications requiring a large language model with strong instruction-following capabilities.
- Tasks benefiting from an extended context window, such as summarization of long documents or complex multi-turn conversations.
- General-purpose AI assistants and chatbots where robust performance on common benchmarks is desired.