Writer/palmyra-mini-thinking-a-MLX-BF16

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Sep 6, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

The Writer/palmyra-mini-thinking-a-MLX-BF16 is a 1.7 billion parameter language model based on the Qwen2 architecture, developed by Writer. Optimized for Apple Silicon using the MLX framework, this bfloat16 precision model is specifically designed for explicit step-by-step reasoning tasks using special tokens. It excels in advanced mathematical reasoning and competitive programming, demonstrating strong performance on benchmarks like MATH500 and Codeforces.

Loading preview...

Overview

Writer/palmyra-mini-thinking-a-MLX-BF16 is a 1.7 billion parameter language model built on the Qwen2 architecture, developed by Writer. This version is specifically optimized for Apple Silicon (M1, M2, M3, M4 series) using the MLX framework and operates in bfloat16 precision. Its core differentiator is its explicit thinking capability through dedicated <think> and </think> tokens, enabling step-by-step reasoning.

Key Capabilities

  • Explicit Reasoning: Utilizes special tokens to perform and display step-by-step thought processes, particularly useful for complex problem-solving.
  • Mathematical Reasoning: Achieves high scores on benchmarks like MATH500 (0.886) and gsm8k (0.8287), indicating strong proficiency in advanced math.
  • Competitive Programming: Demonstrates aptitude in coding challenges with scores like Codeforces (0.5631 pass_rate) and Olympiadbench (0.5481 extractive_match).
  • Optimized for Apple Silicon: Designed for efficient performance on Apple's M-series chips, requiring approximately 3.3GB of memory.

Good For

  • Developers working on Apple Silicon who need a powerful, locally runnable model for reasoning tasks.
  • Applications requiring explicit, verifiable step-by-step problem-solving in mathematics or logic.
  • Tasks involving code generation and competitive programming challenges where detailed thought processes are beneficial.

Limitations

This model is platform-dependent, optimized exclusively for Apple Silicon, and may not run on other hardware. Its explicit thinking mode can increase response length and generation time, and the use_cache: false configuration might impact inference speed.