olaverse/MIST-Mini-8B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:May 30, 2026License:llama3.1Architecture:Transformer0.0K Warm

olaverse/MIST-Mini-8B is an 8 billion parameter model from the MIST family, built by olaverse through blending four specialized Llama 3.1 8B models using DARE+TIES. This model is optimized for speed, achieving 63 tokens/second, while maintaining strong performance in reasoning, coding, and mathematical tasks. It is designed for real-time applications and efficient deployment on consumer GPUs due to its lightweight 15GB footprint.

Loading preview...

MIST-1-8B: A Fast and Capable 8B Model

MIST-1-8B, also known as MIST-Mini, is the smallest and fastest offering in the MIST model family developed by olaverse. This 8 billion parameter model is constructed by blending four specialized Llama 3.1 8B models using the DARE+TIES method, aiming to deliver robust performance at high inference speeds.

Key Capabilities

  • Exceptional Speed: Achieves approximately 63 tokens/second on an H200 GPU, making it suitable for real-time applications.
  • Strong Reasoning: Benefits from DeepSeek R1 distillation, contributing to its reasoning abilities.
  • Clean Code Generation: Produces production-ready code, often including comments.
  • Accurate Math Solving: Capable of accurate, step-by-step mathematical problem-solving.
  • Helpful and Low Refusal: Designed to be helpful with a low refusal rate in responses.
  • Lightweight Deployment: Requires only 15GB of VRAM for bfloat16 precision (e.g., RTX 3090/4090) or 6GB for 4-bit quantization (e.g., RTX 3060+), enabling deployment on consumer-grade GPUs.

Good For

  • Applications requiring high-speed inference and real-time responses.
  • Tasks involving complex reasoning and problem-solving.
  • Code generation and mathematical computations.
  • Deployment on hardware with limited VRAM, such as consumer GPUs.