Overview
This model, dimodimodimo/Mistral-7B-Instruct-v0.2, is an instruction-tuned variant of the Mistral-7B-v0.2 base model, developed by Mistral AI. It builds upon the original Mistral-7B architecture, incorporating significant improvements for enhanced performance in instruction-following tasks.
Key Architectural Changes (vs. Mistral-7B-v0.1)
- Expanded Context Window: Features a 32,000 token context window, a substantial increase from the 8,000 tokens in v0.1, allowing for processing much longer inputs and generating more extensive outputs.
- Rope-theta Update: Utilizes a
Rope-theta value of 1e6, which can impact the model's ability to handle longer sequences and improve positional encoding. - No Sliding-Window Attention: This version does not employ Sliding-Window Attention, differentiating its architectural approach from some other models in its class.
Instruction Format
To leverage the instruction fine-tuning effectively, prompts should be enclosed within [INST] and [/INST] tokens. The first instruction requires a beginning-of-sentence ID. This format is supported via Hugging Face's apply_chat_template() method for easy integration.
Limitations
The Mistral 7B Instruct model is presented as a demonstration of the base model's fine-tuning potential. It currently lacks built-in moderation mechanisms, indicating a need for external guardrails in deployment scenarios requiring content moderation.