BAAI/Infinity-Instruct-3M-0625-Mistral-7B is a 7 billion parameter instruction-tuned language model developed by the Beijing Academy of Artificial Intelligence (BAAI). This model is fine-tuned on the Infinity-Instruct-3M and Infinity-Instruct-0625 datasets without reinforcement learning from human feedback (RLHF). It demonstrates strong performance on instruction following benchmarks, achieving 31.42 on AlpacaEval 2.0, surpassing GPT-3.5 Turbo, and an MT-Bench score of 8.1, comparable to Llama-3-8B-Instruct. It is optimized for general instruction following and chat applications.
Loading preview...
Infinity-Instruct-3M-0625-Mistral-7B Overview
Infinity-Instruct-3M-0625-Mistral-7B is a 7 billion parameter instruction-tuned model from the Beijing Academy of Artificial Intelligence (BAAI). It is developed through supervised instruction tuning, notably without reinforcement learning from human feedback (RLHF), making its performance particularly significant. The model is fine-tuned on the proprietary Infinity-Instruct-3M and Infinity-Instruct-0625 datasets.
Key Capabilities & Performance
- Strong Instruction Following: Achieves a high score of 31.42 on AlpacaEval 2.0, outperforming models like GPT-3.5 Turbo (22.7) and Mixtral 8x7B v0.1 (23.7) in instruction following, as evaluated by GPT-4.
- Multi-turn Dialogue: Scores 8.1 on MT-Bench, indicating robust performance in complex multi-turn conversations, comparable to Llama-3-8B-Instruct and Mistral-7B-Instruct-v0.2.
- Foundational Ability Enhancement: The training process involved an initial fine-tuning phase on Infinity-Instruct-3M to improve foundational abilities, including math and code, before further refinement for chat capabilities.
Training Details
The model undergoes a two-stage fine-tuning process. Initially, the base Mistral-7B-v0.1 is fine-tuned on Infinity-Instruct-3M to create a foundational instruct model. Subsequently, this model is further fine-tuned on the Infinity-Instruct-0625 dataset to enhance its chat capabilities. The training utilizes techniques like concatenating multiple samples to remove padding tokens and various acceleration methods to optimize costs.
Use Cases
This model is well-suited for general-purpose instruction following and chat applications where high-quality responses to diverse prompts are required. Its strong performance on benchmarks without RLHF makes it an interesting option for developers seeking capable models with a simpler training paradigm.