olaverse/MIST-1-70B

Hugging Face
TEXT GENERATIONConcurrency Cost:4Model Size:70BQuant:FP8Ctx Length:8kTool Calling:SupportedPublished:May 30, 2026License:llama3.1Architecture:Transformer0.0K Warm

MIST-1-70B is a 70 billion parameter model from the MIST family by olaverse, built by blending four Llama 3.1 70B models using DARE+TIES. It features a 128K token context window and is optimized for strong reasoning, coding, and mathematical tasks. This model is designed to be highly helpful and supports over 8 languages, providing structured and detailed responses.

Loading preview...

MIST-1-70B: A Merged Llama 3.1 Model for Advanced Reasoning

MIST-1-70B is a 70 billion parameter model developed by olaverse, part of their MIST model family. It is constructed by blending four top Llama 3.1 70B models using the DARE+TIES merge method, which prunes redundant weights and resolves conflicts to combine their best capabilities. This approach aims to deliver a production-ready model with enhanced performance across various domains.

Key Capabilities

  • Strong Reasoning: Incorporates DeepSeek R1 distillation for advanced reasoning abilities.
  • High Helpfulness: Built upon models excelling in helpfulness benchmarks, such as Nemotron.
  • Coding Proficiency: Generates clean, documented, and production-ready code.
  • Mathematical Problem Solving: Offers step-by-step, structured solutions for math problems.
  • Multilingual Support: Capable of understanding and generating text in over 8 languages.
  • Long Context Window: Features a substantial 128K token context window for processing extensive inputs.
  • Unrestricted Instruction Following: Designed to follow instructions without excessive refusals.

Usage Notes

Users should always utilize the built-in tokenizer.apply_chat_template for prompt formatting, as the model's mixed training heritage (Llama 3.1 and ChatML parents) can lead to issues like <|im_end|> tokens appearing in output if prompts are hardcoded. The model supports bfloat16 (requiring 140GB VRAM) and 4-bit quantized (requiring 40GB VRAM) precision.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p