Open-Orca/Mistral-7B-OpenOrca: High-Performance 7B Model
Open-Orca/Mistral-7B-OpenOrca, affectionately codenamed "MistralOrca," is a 7 billion parameter language model developed by Open-Orca. It is fine-tuned on the Mistral 7B base model using a carefully curated subset of the OpenOrca dataset, which is derived from GPT-4 augmented data, aiming to reproduce the dataset generation methodology of Microsoft Research's Orca Paper.
Key Capabilities & Performance
This model demonstrates exceptional performance, particularly for its size, and is capable of running efficiently on consumer-grade GPUs. At its release, it achieved the #1 position on the HuggingFace Leaderboard for models smaller than 30B parameters, surpassing all other 7B and 13B models. Key performance metrics include:
- HuggingFace Leaderboard Average: 65.84 (106% of base Mistral-7B, 98.6% of Llama2-70b-chat)
- MMLU (5-shot): 62.24
- AGIEval Average: 0.397 (129% of base Mistral-7B)
- BigBench-Hard Average: 0.416 (119% of base Mistral-7B)
- MT-Bench Average: 6.86 (on-par with Llama2-70b-chat)
Training & Usage
The model was trained for 62 hours across 4 epochs using 8x A6000 GPUs. It utilizes the OpenAI Chat Markup Language (ChatML) format for prompting, with specific <|im_start|> and <|im_end|> tokens, and supports the apply_chat_template() method in HuggingFace Transformers for easy conversation formatting. Quantized versions (AWQ, GPTQ, GGUF) are available from TheBloke for optimized inference.