allenai/Llama-3.1-Tulu-3-70B

Hugging Face
TEXT GENERATIONConcurrency Cost:4Model Size:70BQuant:FP8Ctx Length:32kPublished:Nov 20, 2024License:llama3.1Architecture:Transformer0.1K Warm

Llama-3.1-Tulu-3-70B is a 70 billion parameter instruction-following model developed by AllenAI, fine-tuned from Meta's Llama 3.1 base model. It offers a comprehensive post-training package with open-source data, code, and recipes. This model is designed for state-of-the-art performance across diverse tasks, including chat, mathematical reasoning (MATH, GSM8K), and instruction following (IFEval), with a context length of 32768 tokens.

Loading preview...

Overview

Llama-3.1-Tulu-3-70B is a 70 billion parameter instruction-following model from AllenAI, built upon Meta's Llama 3.1 base model. It is part of the Tülu 3 family, which emphasizes fully open-source data, code, and training recipes. The model is primarily English-language and is licensed under the Llama 3.1 Community License Agreement.

Key Capabilities

  • Instruction Following: Designed for state-of-the-art performance across a diversity of tasks, including general chat.
  • Mathematical Reasoning: Shows strong performance on benchmarks like MATH and GSM8K.
  • Instruction Following Evaluation (IFEval): Excels in complex instruction following scenarios.
  • Open-Source Approach: Provides a comprehensive post-training package with open-source data, code, and recipes.

Performance Highlights

On a range of benchmarks, the Tülu 3 70B model achieves an average score of 76.0, outperforming Llama 3.1 70B Instruct (73.4) and Qwen 2.5 72B Instruct (71.5). Notable scores include:

  • PopQA (15 shot): 46.5
  • BigBenchHard (3 shot, CoT): 82.0
  • MATH (4 shot CoT, Flex): 63.0
  • GSM8K (8 shot, CoT): 93.5
  • Safety (6 task avg.): 88.3

Usage Considerations

The model has limited safety training and does not include in-the-loop filtering, meaning it can produce problematic outputs. It is intended for research and educational use, and its fine-tuning involved datasets with outputs from third-party models, subject to their respective terms of use.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p