abdelac/Mistral_Test
The Mistral-7B-Instruct-v0.1 model by Mistral AI is a 7 billion parameter instruction-tuned large language model, fine-tuned from the Mistral-7B-v0.1 generative text model. It incorporates architectural choices like Grouped-Query Attention and Sliding-Window Attention, and uses a Byte-fallback BPE tokenizer. This model is designed for instruction-following tasks, leveraging publicly available conversation datasets for its fine-tuning.
Loading preview...
Mistral-7B-Instruct-v0.1 Overview
This model is an instruction-tuned version of the Mistral-7B-v0.1 generative text model, developed by Mistral AI. It has 7 billion parameters and is fine-tuned using a variety of publicly available conversation datasets, making it suitable for instruction-following applications.
Key Architectural Features
- Grouped-Query Attention: Enhances inference speed and reduces memory requirements.
- Sliding-Window Attention: Optimizes context handling for longer sequences.
- Byte-fallback BPE tokenizer: Provides robust tokenization.
Instruction Format
Prompts for this model should adhere to a specific instruction format, enclosed by [INST] and [/INST] tokens. The transformers library's apply_chat_template() method is recommended for correctly formatting messages, ensuring proper interaction with the instruction fine-tuning.
Limitations
As a quick demonstration of fine-tuning capabilities, the Mistral 7B Instruct model currently lacks built-in moderation mechanisms. Users should be aware of this when deploying the model in environments requiring moderated outputs, and community engagement is encouraged to develop guardrails.