plawanrath/mistral-7b-instruct-v0.3-bf16-mlx-cba
The plawanrath/mistral-7b-instruct-v0.3-bf16-mlx-cba model is a 7.2 billion parameter Mistral-family instruction-tuned language model, developed by Plawan Kumar Rath and Rahul Maliakkal. This specific variant is an MLX-format BF16 (uncompressed baseline) version of Mistral-7B-Instruct-v0.3, primarily serving as a reference artifact for research on quantization bias. It is designed for use in MLX environments, particularly for studying the impact of quantization on model behavior and bias emergence.
Loading preview...
Overview
This model, plawanrath/mistral-7b-instruct-v0.3-bf16-mlx-cba, is an MLX-format BF16 (uncompressed baseline) variant of the mistralai/Mistral-7B-Instruct-v0.3 instruction-tuned model. It features 7.2 billion parameters and is specifically re-serialized for direct loading and use within the MLX framework without additional conversion steps. This artifact was developed by Plawan Kumar Rath and Rahul Maliakkal and is one of 15 models used in their research paper, "Quantization Undoes Alignment: Bias Emergence in Compressed LLMs Across Models and Precision Levels."
Key Characteristics
- Base Model:
mistralai/Mistral-7B-Instruct-v0.3 - Parameters: 7.2 billion
- Precision: BF16 (uncompressed baseline)
- Format: Optimized for MLX (Apple Silicon) environments.
- Research Context: Serves as the uncompressed reference for studying how quantization aggressiveness impacts emergent stereotypical behavior, particularly on fairness-sensitive tasks like the BBQ ambiguous questions dataset.
Research Findings
The associated paper highlights a "dose-response" relationship between quantization aggressiveness and the emergence of bias. For instance, Q3 quantization showed 6.0–21.1% of BF16-unbiased items becoming biased, while Q8 showed 0.1–0.9%. These shifts in bias are largely invisible to perplexity (e.g., <0.5% shift at Q8, <3% at Q4), indicating that traditional perplexity metrics may not capture the full impact of quantization on model fairness. This underscores the importance of carefully considering compressed instruction-tuned models for fairness-sensitive applications.
Usage
This model can be loaded and used directly with the mlx-lm library for inference, providing a convenient way to work with the Mistral-7B-Instruct-v0.3 model on Apple Silicon hardware in its BF16 precision.