allenai/open-instruct-flan-v2-13b

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Jun 7, 2023Architecture:Transformer Cold

The allenai/open-instruct-flan-v2-13b model is a 13 billion parameter LLaMa-based language model developed by AllenAI. It is fine-tuned on the Flan V2 dataset, enhancing its instruction-following capabilities across a wide range of tasks. This model is distributed as a weight diff, requiring an existing LLaMa model for recovery, and is optimized for general instruction-tuned applications.

Loading preview...

Overview

This model, allenai/open-instruct-flan-v2-13b, is a 13 billion parameter LLaMa model fine-tuned by AllenAI using the Flan V2 dataset. It was developed as part of the research presented in the paper "How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources." The model is distributed as a weight difference, meaning users need to recover the full model using an existing LLaMa base model and a provided script.

Key Capabilities

  • Instruction Following: Enhanced through fine-tuning on the comprehensive Flan V2 dataset.
  • General-Purpose Language Tasks: Capable of handling a broad spectrum of natural language understanding and generation tasks.
  • Benchmark Performance: Achieves a 25.1 average score across various benchmarks including MMLU (51.2 5-shot), GSM CoT (21.0), BBH CoT (39.2), and Codex-Eval Pass@10 (16.2).

Usage and Input Format

To use this model, users must first recover it from the provided weight diff using the weight_diff.py script from the allenai/open-instruct repository. The model expects inputs formatted with specific user and assistant tags:

<|user|>
Your message here!
<|assistant|>

It is crucial to include a newline after <|assistant|> for optimal generation quality.

Good for

  • Researchers and developers looking for an instruction-tuned LLaMa model.
  • Applications requiring robust instruction-following capabilities.
  • Experimentation with models fine-tuned on the Flan V2 dataset.