Overview
allenai/open-instruct-baize-13b is a 13 billion parameter LLaMa model developed by AllenAI. It has been fine-tuned using the Baize dataset, which focuses on self-chat data, to enhance its instruction-following capabilities. This model is released as a "model diff," meaning users need to recover the full model by applying this diff to an existing LLaMa model in Hugging Face format. The training methodology and evaluation are detailed in the paper "How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources" (arXiv:2306.04751).
Key Capabilities & Performance
This model is designed for general instruction-following and conversational tasks. It was evaluated across a range of benchmarks, including MMLU, GSM, BBH, TydiQA, and Codex-Eval. Notable performance metrics include:
- MMLU (0-shot): 43.5
- MMLU (5-shot): 41.5
- GSM Direct: 4.5
- GSM CoT: 8.5
- BBH Direct: 35.3
- BBH CoT: 36.7
- Codex-Eval Pass@1: 14.5
- Codex-Eval Pass@10: 27.3
Usage Considerations
To use this model, users must first obtain access to a LLaMa model and convert it to the Hugging Face format. The provided weight_diff.py script from the allenai/open-instruct repository is then used to apply the diff and recover the full model. The model expects inputs formatted with <|user|> and <|assistant|> tags, with a crucial newline after <|assistant|> for optimal generation quality. This model is licensed under an AI model license along with the original LLaMa license.