allenai/Llama-3.1-Tulu-3-8B-SFT

Warm
Public
8B
FP8
32768
License: llama3.1
Hugging Face
Overview

Overview

allenai/Llama-3.1-Tulu-3-8B-SFT is an 8 billion parameter instruction-following model from the Tülu3 family, developed by the Allen Institute for AI. It is fine-tuned from the meta-llama/Llama-3.1-8B base model and is notable for its fully open-source data, code, and training recipes, serving as a guide for modern post-training techniques. The model is primarily English-language and is licensed under the Llama 3.1 Community License Agreement.

Key Capabilities & Performance

This model demonstrates strong performance across a variety of benchmarks, particularly in reasoning and instruction following tasks. While the final Tülu 3 8B model (after DPO and RLVR) generally leads, the SFT version shows competitive results:

  • Mathematical Reasoning: Achieves 31.5% on MATH (4-shot CoT, Flex) and 76.2% on GSM8K (8-shot CoT).
  • Instruction Following: Scores 72.8% on IFEval (prompt loose).
  • General Reasoning: Performs well on BigBenchHard (3-shot CoT) at 67.9% and PopQA (15-shot) at 29.3%.
  • Safety: Achieves 93.1% average across 6 safety tasks, which is the highest among the 8B models compared.

Training & Usage

The model was trained for 2 epochs with a maximum sequence length of 4096 and a learning rate of 5E-6. It can be easily loaded using HuggingFace AutoModelForCausalLM or served with VLLM. The chat template uses <|user|> and <|assistant|> roles, and a default system prompt of "You are Tulu 3, a helpful and harmless AI Assistant built by the Allen Institute for AI." is suggested for demos.

Limitations

The Tülu3 models have limited safety training and may produce problematic outputs if prompted to do so, as they do not include in-the-loop filtering like some commercial models.