Overview
Llama-3.1-Tulu-3-70B is a 70 billion parameter instruction-following model from AllenAI, built upon Meta's Llama 3.1 base model. It is part of the Tülu 3 family, which emphasizes fully open-source data, code, and training recipes. The model is primarily English-language and is licensed under the Llama 3.1 Community License Agreement.
Key Capabilities
- Instruction Following: Designed for state-of-the-art performance across a diversity of tasks, including general chat.
- Mathematical Reasoning: Shows strong performance on benchmarks like MATH and GSM8K.
- Instruction Following Evaluation (IFEval): Excels in complex instruction following scenarios.
- Open-Source Approach: Provides a comprehensive post-training package with open-source data, code, and recipes.
Performance Highlights
On a range of benchmarks, the Tülu 3 70B model achieves an average score of 76.0, outperforming Llama 3.1 70B Instruct (73.4) and Qwen 2.5 72B Instruct (71.5). Notable scores include:
- PopQA (15 shot): 46.5
- BigBenchHard (3 shot, CoT): 82.0
- MATH (4 shot CoT, Flex): 63.0
- GSM8K (8 shot, CoT): 93.5
- Safety (6 task avg.): 88.3
Usage Considerations
The model has limited safety training and does not include in-the-loop filtering, meaning it can produce problematic outputs. It is intended for research and educational use, and its fine-tuning involved datasets with outputs from third-party models, subject to their respective terms of use.