Name: allenai/Llama-3.1-Tulu-3-8B-DPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: allenai

Model Overview

allenai/Llama-3.1-Tulu-3-8B-DPO is an 8 billion parameter instruction-following model developed by the Allen Institute for AI (AllenAI). It is part of the Tülu3 family, which focuses on providing fully open-source data, code, and recipes for modern post-training techniques. This specific model is a Direct Preference Optimization (DPO) fine-tune of the allenai/Llama-3.1-Tulu-3-8B-SFT model, built upon the meta-llama/Llama-3.1-8B base.

Key Capabilities & Performance

The Tülu3 models are designed for strong performance across a variety of tasks beyond general chat. The 8B DPO model demonstrates competitive results against other models in its class, particularly excelling in:

Mathematical Reasoning: Achieves 42.0 on MATH (4 shot CoT, Flex) and 84.3 on GSM8K (8 shot, CoT).
Instruction Following: Scores 81.1 on IFEval (prompt loose).
General Performance: An average score of 64.4 across evaluated benchmarks.

Training & Licensing

The model was trained on a mix of publicly available, synthetic, and human-created datasets. It is released under Meta's Llama 3.1 Community License Agreement, with additional terms from Gemma and Qwen due to dataset usage. The training repository is available at allenai/open-instruct.

Usage Considerations

While designed for high performance, the Tülu3 models have limited safety training compared to proprietary models and may produce problematic outputs. Users should be aware of these limitations and refer to the Responsible Use Guidelines.

Overview

Model Overview

Key Capabilities & Performance

Training & Licensing

Usage Considerations

Full Model Card (README)