allenai/llama-3-tulu-2-70b

TEXT GENERATIONConcurrency Cost:4Model Size:70BQuant:FP8Ctx Length:8kPublished:Jun 20, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

allenai/llama-3-tulu-2-70b is a 70 billion parameter language model developed by AllenAI, fine-tuned from Meta-Llama-3-70B. It is trained on a diverse mix of publicly available, synthetic, and human-created datasets to function as a helpful assistant. This model excels in general conversational tasks and instruction following, demonstrating strong performance across various benchmarks including MMLU, GSM8k, and HumanEval.

Loading preview...

Model Overview

allenai/llama-3-tulu-2-70b is a 70 billion parameter language model developed by AllenAI, fine-tuned from the meta-llama/Meta-Llama-3-70B base model. It is designed to act as a helpful assistant, trained on a comprehensive mix of publicly available, synthetic, and human-created datasets. The training methodology is detailed in the paper "Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2" [https://arxiv.org/abs/2311.10702].

Key Capabilities and Performance

This model demonstrates strong performance across a range of benchmarks, making it suitable for diverse applications:

  • General Instruction Following: Achieves an average score of 73.01 across various benchmarks, indicating robust performance as an assistant.
  • Reasoning: Scores 0.752 on MMLU (5-shot) and 0.845 on GSM8k (8-shot cot), showcasing its reasoning abilities.
  • Code Generation: Attains 0.861 on Codex HumanEval Pass@10, highlighting its proficiency in coding tasks.
  • Truthfulness: Scores 0.646 on TruthfulQA %Info+True, indicating a good balance of informativeness and truthfulness.

Input Format

The model is trained to use a specific input format for optimal generation quality:

<|user|>
Your message here!
<|assistant|>

It is crucial to include a newline after <|assistant|> for best results.

Intended Uses

This model is primarily intended for use as a helpful assistant, capable of handling a wide array of conversational and instruction-based tasks. Its fine-tuning on a diverse dataset makes it adaptable to various general-purpose applications requiring strong language understanding and generation.