jedisct1/Qwen3-4B-Thinking-2507-mlx

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Aug 6, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

The jedisct1/Qwen3-4B-Thinking-2507-mlx model is a 4 billion parameter language model, converted to the MLX format from the Qwen/Qwen3-4B-Thinking-2507 base model. It features a 32,768 token context length. This model is specifically designed for efficient deployment and inference on Apple Silicon, leveraging the MLX framework. Its primary utility lies in applications requiring a compact yet capable language model for local execution.

Loading preview...

jedisct1/Qwen3-4B-Thinking-2507-mlx Overview

This model is a 4 billion parameter language model, jedisct1/Qwen3-4B-Thinking-2507-mlx, which has been converted to the MLX format. The conversion was performed from the original Qwen/Qwen3-4B-Thinking-2507 base model using mlx-lm version 0.26.2. It supports a substantial context length of 32,768 tokens.

Key Capabilities

  • MLX Optimization: Specifically formatted for efficient inference on Apple Silicon, making it suitable for local development and deployment on compatible hardware.
  • Compact Size: With 4 billion parameters, it offers a balance between performance and resource consumption, ideal for scenarios where larger models are impractical.
  • Extended Context Window: A 32,768 token context length allows for processing and generating longer sequences of text, beneficial for complex tasks requiring extensive context.

Good For

  • Local Inference: Developers looking to run a capable language model directly on Apple Silicon devices without relying on cloud resources.
  • Experimentation: Rapid prototyping and testing of LLM applications in a local environment.
  • Resource-Constrained Environments: Use cases where computational resources are limited but a robust language model is still required.