BlueMoonlight/Qwen3-4B-Instruct-2507-mlx-fp16
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 11, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

BlueMoonlight/Qwen3-4B-Instruct-2507-mlx-fp16 is a 4 billion parameter instruction-tuned causal language model, converted to the MLX format by BlueMoonlight from the original Qwen3-4B-Instruct-2507. This model is optimized for efficient deployment and inference on Apple Silicon using the MLX framework, making it suitable for local execution on compatible hardware. It retains the core capabilities of the Qwen3-4B-Instruct model, focusing on general-purpose instruction following and text generation tasks.

Loading preview...

Model Overview

BlueMoonlight/Qwen3-4B-Instruct-2507-mlx-fp16 is a 4 billion parameter instruction-tuned language model, specifically adapted for the MLX framework. This model is a conversion of the original Qwen/Qwen3-4B-Instruct-2507, performed by BlueMoonlight using mlx-lm version 0.29.1. The primary purpose of this conversion is to enable efficient inference on Apple Silicon devices, leveraging the MLX ecosystem.

Key Characteristics

  • Parameter Count: 4 billion parameters, offering a balance between performance and computational requirements.
  • Instruction-Tuned: Designed to follow instructions effectively, making it suitable for a wide range of conversational and task-oriented applications.
  • MLX Format: Optimized for deployment and execution on Apple Silicon (e.g., M1, M2, M3 chips) via the MLX library, providing native performance benefits.
  • Context Length: Supports a substantial context window of 40960 tokens, allowing for processing and generating longer texts while maintaining coherence.

Use Cases

This model is particularly well-suited for developers and users who:

  • Require a capable instruction-following model for local inference on Apple Silicon hardware.
  • Are working on applications that benefit from a large context window for complex queries or document processing.
  • Need a model for general text generation, summarization, question answering, and conversational AI tasks within the MLX ecosystem.