kerolos1/Mistral-7B-Instruct-v0.1-Full-Final

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Apr 3, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

kerolos1/Mistral-7B-Instruct-v0.1-Full-Final is an instruction-tuned 7 billion parameter large language model developed by Mistral AI. Based on the Mistral-7B-v0.1 architecture, it utilizes Grouped-Query Attention and Sliding-Window Attention for efficient processing. This model is fine-tuned using publicly available conversation datasets, making it suitable for instruction-following tasks and general conversational AI applications. It supports a 4096-token context length and is designed for quick demonstration of fine-tuning capabilities.

Loading preview...

Model Overview

kerolos1/Mistral-7B-Instruct-v0.1-Full-Final is an instruction-tuned variant of the Mistral-7B-v0.1 base model, developed by the Mistral AI Team. This 7 billion parameter model is designed for instruction-following tasks, leveraging fine-tuning on various publicly available conversation datasets.

Key Architectural Features

This model incorporates advanced architectural choices from its base model, Mistral-7B-v0.1, to enhance performance and efficiency:

  • Grouped-Query Attention (GQA): Improves inference speed and reduces memory footprint.
  • Sliding-Window Attention (SWA): Optimizes handling of longer sequences by limiting attention to a fixed-size window, enabling a 4096-token context length.
  • Byte-fallback BPE tokenizer: Provides robust tokenization across diverse text inputs.

Instruction Format

To effectively utilize the instruction fine-tuning, prompts should be enclosed within [INST] and [/INST] tokens. The first instruction requires a begin-of-sentence ID. This format is compatible with Hugging Face's apply_chat_template() method for easy integration.

Limitations

As a quick demonstration of fine-tuning, the Mistral 7B Instruct model currently lacks built-in moderation mechanisms. The developers are actively seeking community engagement to implement guardrails for moderated outputs, making it suitable for deployment in sensitive environments.