plawanrath/mistral-7b-instruct-v0.3-bf16-mlx-cba

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:May 5, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The plawanrath/mistral-7b-instruct-v0.3-bf16-mlx-cba model is an MLX-format BF16 (uncompressed baseline) variant of Mistral AI's Mistral-7B-Instruct-v0.3, featuring 7.2 billion parameters. Developed by Plawan Kumar Rath and Rahul Maliakkal, this artifact serves as a reference in research on quantization's impact on LLM bias. It is specifically formatted for Apple Silicon (MLX) and is crucial for studying bias emergence in compressed instruction-tuned models.

Loading preview...

Overview

This model, plawanrath/mistral-7b-instruct-v0.3-bf16-mlx-cba, is an MLX-formatted BF16 (uncompressed baseline) version of the mistralai/Mistral-7B-Instruct-v0.3 model. It contains 7.2 billion parameters and is specifically designed for use with Apple Silicon via the MLX framework. This particular artifact is one of 15 models used in the research paper "Quantization Undoes Alignment: Bias Emergence in Compressed LLMs Across Models and Precision Levels" by Plawan Kumar Rath and Rahul Maliakkal.

Key Characteristics

  • Base Model: Mistral-7B-Instruct-v0.3 (Mistral family).
  • Parameters: 7.2 billion.
  • Precision: BF16 (uncompressed baseline), serving as a reference for quantization studies.
  • Format: MLX, optimized for Apple Silicon, allowing direct loading without extra conversion steps.
  • Research Context: This exact artifact was used to produce inference results in a paper investigating how quantization aggressiveness correlates with emergent stereotypical behavior on fairness-sensitive tasks (BBQ ambiguous questions).

Research Findings Highlighted

The associated paper reveals a "dose-response" relationship between quantization aggressiveness and increased bias. For instance, Q3 quantization showed 6.0–21.1% of BF16-unbiased items becoming biased, while Q8 showed 0.1–0.9%. These bias changes were largely invisible to perplexity shifts (<0.5% at Q8, <3% at Q4), underscoring the importance of considering bias in compressed models for fairness-sensitive applications.

Usage

This model can be loaded and used with the mlx-lm library for generation tasks, as demonstrated in the provided Python and CLI examples.