allenai/open-instruct-sni-13b

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Jun 7, 2023Architecture:Transformer Cold

The allenai/open-instruct-sni-13b model is a 13 billion parameter LLaMa-based language model developed by Allen Institute for AI. It is fine-tuned on the Super-Natural Instructions dataset, enhancing its ability to follow diverse instructions. This model is distributed as a weight diff, requiring a base LLaMa model for recovery. It is designed for general instruction-following tasks, demonstrating performance across various benchmarks including MMLU and Codex-Eval.

Loading preview...

Overview

allenai/open-instruct-sni-13b is a 13 billion parameter LLaMa-based model developed by Allen Institute for AI. It has been fine-tuned using the Super-Natural Instructions dataset, which focuses on improving the model's ability to understand and execute a wide range of instructions. This model is released as a weight difference (diff) and requires a pre-existing LLaMa model in Hugging Face format for full recovery and usage.

Key Capabilities & Features

  • Instruction Following: Enhanced through fine-tuning on the Super-Natural Instructions dataset.
  • LLaMa Architecture: Built upon the LLaMa foundation model, leveraging its robust language understanding.
  • Model Recovery: Utilizes a weight_diff.py script to combine the provided diff with a base LLaMa model.
  • Specific Input Format: Designed to work optimally with a <|user|> Your message here! <|assistant|> input structure, with a critical newline after <|assistant|>.

Performance Highlights

Based on the paper "How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources", the model achieves notable scores across various benchmarks:

  • MMLU (5-shot): 50.8
  • Codex-Eval Pass@1: 8.2
  • TydiQA Gold-Passage: 51.4

When to Use This Model

This model is suitable for developers looking for an instruction-tuned LLaMa-based model for general natural language understanding and generation tasks. Its fine-tuning on a diverse instruction dataset makes it a strong candidate for applications requiring robust instruction following. Users should be prepared to perform the model recovery process using a base LLaMa model.