Name: Writer/palmyra-mini-thinking-a-MLX-BF16 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Writer

Overview

Writer/palmyra-mini-thinking-a-MLX-BF16 is a 1.7 billion parameter language model built on the Qwen2 architecture, developed by Writer. This version is specifically optimized for Apple Silicon (M1, M2, M3, M4 series) using the MLX framework and operates in bfloat16 precision. Its core differentiator is its explicit thinking capability through dedicated <think> and </think> tokens, enabling step-by-step reasoning.

Key Capabilities

Explicit Reasoning: Utilizes special tokens to perform and display step-by-step thought processes, particularly useful for complex problem-solving.
Mathematical Reasoning: Achieves high scores on benchmarks like MATH500 (0.886) and gsm8k (0.8287), indicating strong proficiency in advanced math.
Competitive Programming: Demonstrates aptitude in coding challenges with scores like Codeforces (0.5631 pass_rate) and Olympiadbench (0.5481 extractive_match).
Optimized for Apple Silicon: Designed for efficient performance on Apple's M-series chips, requiring approximately 3.3GB of memory.

Good For

Developers working on Apple Silicon who need a powerful, locally runnable model for reasoning tasks.
Applications requiring explicit, verifiable step-by-step problem-solving in mathematics or logic.
Tasks involving code generation and competitive programming challenges where detailed thought processes are beneficial.

Limitations

This model is platform-dependent, optimized exclusively for Apple Silicon, and may not run on other hardware. Its explicit thinking mode can increase response length and generation time, and the use_cache: false configuration might impact inference speed.

Overview

Overview

Key Capabilities

Good For

Limitations

Full Model Card (README)