dizza01/Mistral-7B-Instruct-v0.2
dizza01/Mistral-7B-Instruct-v0.2 is a 7 billion parameter instruction-tuned causal language model developed by Mistral AI. It is an instruct fine-tuned version of Mistral-7B-v0.2, featuring an expanded 32k context window and modified Rope-theta. This model is optimized for following instructions and generating coherent text based on user prompts, making it suitable for general-purpose conversational AI and instruction-following tasks.
Loading preview...
Mistral-7B-Instruct-v0.2 Overview
This model is an instruction-tuned variant of the Mistral-7B-v0.2 Large Language Model, developed by the Mistral AI team. It builds upon the base Mistral-7B-v0.2 model with several key enhancements, primarily focusing on improved instruction following capabilities.
Key Enhancements & Features
- Instruction Fine-tuning: The model has been fine-tuned to better understand and respond to instructions, making it more effective for chat and prompt-based interactions.
- Expanded Context Window: Features a 32k context window, a significant increase from the 8k context in its predecessor, Mistral-7B-v0.1, allowing for processing longer inputs and maintaining more extensive conversational history.
- Modified Architecture: Incorporates a Rope-theta value of 1e6 and removes Sliding-Window Attention, indicating architectural adjustments for performance.
- Instruction Format: Utilizes a specific
[INST]and[/INST]token format for optimal instruction processing, which can be applied viaapply_chat_template()in Hugging Face Transformers.
Usage and Limitations
This model is designed for general instruction-following tasks. While it demonstrates compelling performance for its size, it currently lacks built-in moderation mechanisms. Users are encouraged to implement their own guardrails for deployments requiring moderated outputs. The model can be easily integrated and used for inference with mistral_common, mistral_inference, and Hugging Face transformers libraries.