Overview
ParetoQaft/8B-Tulu-full is an 8 billion parameter instruction-following model developed by the Allen Institute for AI, built upon the Llama-3.1-8B base model. It is a key component of the Tülu3 model family, which emphasizes fully open-source data, code, and recipes for advanced post-training techniques. The model is primarily English-language and is released under the Llama 3.1 Community License Agreement.
Key Capabilities & Performance
This model is engineered for state-of-the-art performance across a variety of tasks, including general chat, mathematical reasoning, and instruction following. It demonstrates competitive performance against other 8B-class models, particularly excelling in:
- MATH (4 shot CoT, Flex): Achieves 43.7, outperforming Llama 3.1 8B Instruct (42.5) and Qwen 2.5 7B Instruct (14.8).
- GSM8K (8 shot, CoT): Scores 87.6, surpassing Llama 3.1 8B Instruct (83.4).
- IFEval (prompt loose): Reaches 82.4, higher than Llama 3.1 8B Instruct (80.6) and Qwen 2.5 7B Instruct (74.7).
- BigBenchHard (3 shot, CoT): Scores 66.0, significantly higher than Llama 3.1 8B Instruct (62.8) and Qwen 2.5 7B Instruct (21.7).
Training & Usage
The model was fine-tuned using a mix of publicly available, synthetic, and human-created datasets. It supports a maximum sequence length of 4096 during SFT training. The recommended chat template follows a specific user/assistant format, and a default system prompt is provided for use in AI2 demos. Users should be aware that the Tülu3 models have limited safety training and may produce problematic outputs if explicitly prompted.
Good for
- Applications requiring strong instruction following.
- Tasks involving mathematical reasoning and problem-solving.
- Research and educational purposes, leveraging its open-source methodology.
- Developers looking for a Llama 3.1-based model with enhanced performance in specific benchmarks.