Overview
Infinity-Instruct-3M-0625-Llama3-70B is a 70 billion parameter instruction-tuned model from Beijing Academy of Artificial Intelligence (BAAI). It is developed through supervised instruction tuning, specifically without the use of reinforcement learning from human feedback (RLHF). The model is fine-tuned on the proprietary Infinity-Instruct-3M and Infinity-Instruct-0625 datasets, which are million-level instruction datasets.
Key Capabilities & Performance
- Instruction Following: Excels in general instruction following tasks, as evidenced by its performance on benchmarks.
- Benchmark Results: Achieves a score of 38.0 on AlpacaEval 2.0, surpassing GPT4-0613 (30.2) and the official Llama-3-70B-Instruct (34.4) in this metric. It also scores 8.9 on MT-Bench.
- Training Methodology: Utilizes a two-stage fine-tuning process: first, applying Infinity-Instruct-3M to enhance foundational abilities (math & code), then further fine-tuning with Infinity-Instruct-0625 for stronger chat capabilities.
- Efficiency: Training leverages techniques from FlagScale to reduce padding tokens and accelerate the training procedure, optimizing costs.
When to Use This Model
- Instruction Following Applications: Ideal for scenarios requiring robust instruction adherence and conversational abilities.
- Research & Development: Suitable for academic research, particularly in exploring instruction tuning without RLHF.
- Cost-Effective Solutions: Offers competitive performance against larger models like GPT-4 on certain benchmarks, potentially providing a more efficient alternative for specific use cases.