olaverse/MIST-Mini-8B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:May 30, 2026License:llama3.1Architecture:Transformer0.0K Warm

MIST-Mini-8B, developed by olaverse, is the smallest and fastest model in the MIST family, built by blending four specialized Llama 3.1 8B models using DARE+TIES. This 8 billion parameter model is optimized for speed, achieving 63 tokens/second, while offering strong reasoning, clean code generation, and accurate mathematical problem-solving. It is designed to be lightweight, requiring only 15GB VRAM for bfloat16 precision, making it suitable for real-time applications and consumer-grade GPUs.

Loading preview...

MIST-Mini-8B: Fast and Capable 8B Model

MIST-Mini-8B, now known as MIST-1-8B, is the most compact and rapid offering within the MIST model family by olaverse. It is constructed by merging four specialized Llama 3.1 8B models using the DARE+TIES technique, aiming for a balance of performance and speed.

Key Capabilities

  • Exceptional Speed: Achieves approximately 63 tokens/second on an H200, making it highly suitable for real-time applications.
  • Strong Reasoning: Benefits from DeepSeek R1 distillation, contributing to robust reasoning abilities.
  • Code Generation: Produces clean, production-ready code with comments.
  • Mathematical Proficiency: Capable of accurate, step-by-step mathematical problem-solving.
  • Helpful & Lightweight: Exhibits a low refusal rate and is lightweight, requiring only 15GB VRAM for bfloat16, allowing it to run on consumer GPUs like the RTX 3090/4090.

Hardware Requirements

This model supports bfloat16 precision with 16GB VRAM (e.g., RTX 3090/4090) and 4-bit quantization requiring 6GB VRAM (e.g., RTX 3060+).