ParetoQaft/8B-Tulu-full

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Jan 10, 2026License:llama3.1Architecture:Transformer Warm

The ParetoQaft/8B-Tulu-full model is an 8 billion parameter instruction-following model from Allen Institute for AI, fine-tuned from Llama-3.1-8B. It is part of the Tülu3 family, known for its open-source data and post-training techniques. This model is designed for strong performance across diverse tasks, including mathematical reasoning (MATH, GSM8K) and instruction following (IFEval), with a 32768 token context length.

Loading preview...

Overview

ParetoQaft/8B-Tulu-full is an 8 billion parameter instruction-following model developed by the Allen Institute for AI, built upon the Llama-3.1-8B base model. It is a key component of the Tülu3 model family, which emphasizes fully open-source data, code, and recipes for advanced post-training techniques. The model is primarily English-language and is released under the Llama 3.1 Community License Agreement.

Key Capabilities & Performance

This model is engineered for state-of-the-art performance across a variety of tasks, including general chat, mathematical reasoning, and instruction following. It demonstrates competitive performance against other 8B-class models, particularly excelling in:

  • MATH (4 shot CoT, Flex): Achieves 43.7, outperforming Llama 3.1 8B Instruct (42.5) and Qwen 2.5 7B Instruct (14.8).
  • GSM8K (8 shot, CoT): Scores 87.6, surpassing Llama 3.1 8B Instruct (83.4).
  • IFEval (prompt loose): Reaches 82.4, higher than Llama 3.1 8B Instruct (80.6) and Qwen 2.5 7B Instruct (74.7).
  • BigBenchHard (3 shot, CoT): Scores 66.0, significantly higher than Llama 3.1 8B Instruct (62.8) and Qwen 2.5 7B Instruct (21.7).

Training & Usage

The model was fine-tuned using a mix of publicly available, synthetic, and human-created datasets. It supports a maximum sequence length of 4096 during SFT training. The recommended chat template follows a specific user/assistant format, and a default system prompt is provided for use in AI2 demos. Users should be aware that the Tülu3 models have limited safety training and may produce problematic outputs if explicitly prompted.

Good for

  • Applications requiring strong instruction following.
  • Tasks involving mathematical reasoning and problem-solving.
  • Research and educational purposes, leveraging its open-source methodology.
  • Developers looking for a Llama 3.1-based model with enhanced performance in specific benchmarks.