BAAI/Infinity-Instruct-7M-Gen-mistral-7B
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jul 25, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Infinity-Instruct-7M-Gen-Mistral-7B is a 7 billion parameter instruction-tuned causal language model developed by Beijing Academy of Artificial Intelligence (BAAI). This model is fine-tuned on the Infinity-Instruct-7M and Infinity-Instruct-Gen datasets without reinforcement learning from human feedback (RLHF). It demonstrates favorable results on AlpacaEval 2.0 compared to larger models like Mixtral 8x22B v0.1, Gemini Pro, and GPT-4, making it suitable for general instruction-following and chat applications.

Loading preview...

Infinity-Instruct-7M-Gen-Mistral-7B Overview

Infinity-Instruct-7M-Gen-Mistral-7B is a 7 billion parameter instruction-tuned language model developed by the Beijing Academy of Artificial Intelligence (BAAI). This model is notable for being trained exclusively with supervised instruction tuning, without the use of reinforcement learning from human feedback (RLHF).

Key Capabilities and Training

The model is built upon the Mistral-7B-v0.1 base and undergoes a two-stage fine-tuning process:

  • Foundational Tuning: Initially, it's fine-tuned on the Infinity-Instruct-7M dataset to enhance foundational abilities, particularly in areas like mathematics and code.
  • Generative Tuning: Subsequently, it's further fine-tuned on the Infinity-Instruct-Gen dataset to develop into a stronger chat model.

This training methodology, utilizing million-level instruction datasets, has enabled the model to achieve competitive performance.

Performance Highlights

Despite its 7B parameter size and lack of RLHF, Infinity-Instruct-7M-Gen-Mistral-7B shows strong results on benchmarks:

  • AlpacaEval 2.0: Achieves a score of 40.0, outperforming models like Mixtral 8x7B v0.1 (23.7), Gemini Pro (24.4), and even some GPT-4 versions (e.g., GPT-4-0613 at 30.2).
  • MT-Bench: Scores 8.1.
  • Arena-hard: Scores 26.9.

Use Cases

This model is well-suited for general instruction-following tasks and conversational AI applications where a performant, instruction-tuned model without RLHF is desired. Its strong AlpacaEval 2.0 performance suggests good alignment with human preferences for helpfulness and safety in a chat context.