Name: olaverse/MIST-Mini-8B-Thinking API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: olaverse

MIST-Mini-8B-Thinking Overview

MIST-Mini-8B-Thinking, developed by olaverse, is an 8 billion parameter model designed for enhanced reasoning capabilities. It is a specialized version of the MIST-Mini-8B model, distinguished by its ability to articulate its thought process before delivering an answer. This transparency is achieved through a unique 4-phase Group Relative Policy Optimization (GRPO) reinforcement learning approach.

Key Capabilities & Training

Transparent Reasoning: The model explicitly shows its thinking steps within <think> tags, allowing users to verify the logic behind its answers.
Strong Mathematical Performance: Achieved 95% accuracy on the GSM8K dataset after its specialized training, indicating robust math problem-solving skills.
GRPO Training: The model was trained across four phases using datasets like OpenR1-Math-220k, Orca-Math-Word-Problems-200k, and GSM8K. Reward functions incentivized correct answers, structured reasoning steps, and proper use of <think> tags.
Efficiency: As an 8B parameter model, it is designed to run efficiently, even on consumer-grade GPUs, with 4-bit quantized versions fitting on 6GB VRAM.

Ideal Use Cases

Explainable AI: When applications require not just an answer, but also a clear, verifiable explanation of how that answer was derived.
Mathematical Problem Solving: For tasks involving arithmetic, word problems, and other quantitative reasoning where accuracy and step-by-step logic are crucial.
Educational Tools: Can be used to demonstrate problem-solving methodologies in an interactive way.
Resource-Constrained Environments: Its 8B size and 4-bit quantization option make it suitable for deployment on hardware with limited VRAM.

Overview

MIST-Mini-8B-Thinking Overview

Key Capabilities & Training

Ideal Use Cases

Full Model Card (README)