olaverse/MIST-Mini-8B-Thinking

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:May 30, 2026License:llama3.1Architecture:Transformer0.0K Warm

MIST-Mini-8B-Thinking by olaverse is an 8 billion parameter reasoning-focused language model, derived from the MIST-Mini-8B family. It is specifically trained with 4 phases of Group Relative Policy Optimization (GRPO) reinforcement learning to explicitly show its step-by-step thought process using tags before providing an answer. This model excels in mathematical reasoning, achieving 95% accuracy on GSM8K, and is designed for transparent and verifiable problem-solving on consumer GPUs.

Loading preview...

MIST-Mini-8B-Thinking Overview

MIST-Mini-8B-Thinking, developed by olaverse, is an 8 billion parameter language model specialized in transparent reasoning. It is a variant of the MIST-Mini-8B model, uniquely fine-tuned through a 4-phase Group Relative Policy Optimization (GRPO) reinforcement learning process. This training enables the model to articulate its thought process within <think> tags before delivering a final answer, making its reasoning verifiable and trustworthy.

Key Capabilities and Features

  • Transparent Reasoning: Explicitly shows step-by-step thinking using <think> tags, allowing users to follow and verify the model's logic.
  • Strong Mathematical Performance: Achieves 95% accuracy on the GSM8K benchmark after its specialized training, indicating robust math problem-solving abilities.
  • Efficient and Accessible: As an 8B parameter model, it is designed to run efficiently on consumer-grade GPUs, with 4-bit quantized versions fitting on as little as 6GB of VRAM.
  • GRPO Training: Utilizes a sophisticated reinforcement learning approach with specific reward functions for correct answers, structured reasoning steps, and proper use of the <think> format.

Ideal Use Cases

  • Educational Tools: For applications requiring step-by-step explanations in math or logic.
  • Problem Solving: Scenarios where not just the answer, but also the method to arrive at it, is crucial.
  • Auditable AI: Use cases demanding transparency and verifiability of the model's decision-making process.
  • Resource-Constrained Environments: Its efficient size allows deployment on consumer hardware.