alquimista888/mixtral_quantized

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:8kLicense:apache-2.0Architecture:Transformer Open Weights Cold

The alquimista888/mixtral_quantized model is an instruct fine-tuned version of the Mistral-7B-v0.2 Large Language Model, developed by Mistral AI. This 7 billion parameter model features an expanded 32k context window and is optimized for instruction-following tasks. It is designed for developers seeking a powerful yet efficient model for various natural language processing applications.

Loading preview...

Overview

This model, alquimista888/mixtral_quantized, is an instruct fine-tuned variant of the Mistral-7B-v0.2 Large Language Model, originally developed by Mistral AI. It builds upon the Mistral-7B-v0.2 base, which introduced significant improvements over its predecessor, Mistral-7B-v0.1.

Key Enhancements from Mistral-7B-v0.1 to v0.2

  • Expanded Context Window: The model now supports a 32k context window, a substantial increase from the 8k context in v0.1, allowing for processing longer inputs and maintaining more conversational history.
  • Rope-theta Adjustment: Incorporates a Rope-theta value of 1e6.
  • Sliding-Window Attention Removal: The v0.2 base model no longer utilizes Sliding-Window Attention.

Instruction Format

To leverage the instruction fine-tuning effectively, prompts should be enclosed within [INST] and [/INST] tokens. The first instruction requires a begin-of-sentence ID, while subsequent instructions do not. The model's generation is terminated by an end-of-sentence token ID. This format is readily available via the apply_chat_template() method in the Hugging Face Transformers library.

Limitations

As an instruct fine-tuned model, it demonstrates compelling performance but currently lacks built-in moderation mechanisms. The developers are actively seeking community engagement to implement guardrails for moderated outputs, enabling deployment in sensitive environments.