allenai/Llama-3.1-Tulu-3-8B-DPO

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Nov 20, 2024License:llama3.1Architecture:Transformer0.0K Warm

allenai/Llama-3.1-Tulu-3-8B-DPO is an 8 billion parameter instruction-following model from the Tülu3 family, fine-tuned using Direct Preference Optimization (DPO) on the Llama 3.1 base model. Developed by Allen Institute for AI, it is designed for state-of-the-art performance across diverse tasks including chat, mathematical reasoning (MATH, GSM8K), and instruction following (IFEval), with a context length of 32768 tokens. This model offers fully open-source data, code, and recipes for advanced post-training techniques.

Loading preview...

Model Overview

allenai/Llama-3.1-Tulu-3-8B-DPO is an 8 billion parameter instruction-following model developed by the Allen Institute for AI (AllenAI). It is part of the Tülu3 family, which focuses on providing fully open-source data, code, and recipes for modern post-training techniques. This specific model is a Direct Preference Optimization (DPO) fine-tune of the allenai/Llama-3.1-Tulu-3-8B-SFT model, built upon the meta-llama/Llama-3.1-8B base.

Key Capabilities & Performance

The Tülu3 models are designed for strong performance across a variety of tasks beyond general chat. The 8B DPO model demonstrates competitive results against other models in its class, particularly excelling in:

  • Mathematical Reasoning: Achieves 42.0 on MATH (4 shot CoT, Flex) and 84.3 on GSM8K (8 shot, CoT).
  • Instruction Following: Scores 81.1 on IFEval (prompt loose).
  • General Performance: An average score of 64.4 across evaluated benchmarks.

Training & Licensing

The model was trained on a mix of publicly available, synthetic, and human-created datasets. It is released under Meta's Llama 3.1 Community License Agreement, with additional terms from Gemma and Qwen due to dataset usage. The training repository is available at allenai/open-instruct.

Usage Considerations

While designed for high performance, the Tülu3 models have limited safety training compared to proprietary models and may produce problematic outputs. Users should be aware of these limitations and refer to the Responsible Use Guidelines.