olaverse/MIST-1-70B
MIST-1-70B is a 70 billion parameter language model from olaverse, part of the MIST model family, built by merging four top Llama 3.1 70B models using DARE+TIES. It features a 128K token context window and excels in reasoning, coding, and mathematical problem-solving. This model is designed for production-ready applications requiring strong performance across multiple languages and complex instruction following.
Loading preview...
MIST-1-70B Overview
MIST-1-70B is a 70 billion parameter model developed by olaverse, forming a key part of their MIST model family. This model is uniquely constructed by blending four leading Llama 3.1 70B models using the DARE+TIES merge method. DARE prunes redundant weights, while TIES resolves conflicts through sign consensus, combining the best capabilities of its constituent models.
Key Capabilities
- Strong Reasoning: Achieved through DeepSeek R1 distillation, enabling correct step-by-step problem-solving.
- Highly Helpful: Built on Nemotron, scoring high on helpfulness benchmarks.
- Coding & Math: Delivers clean, documented code with type hints and structured, verifiable mathematical solutions.
- Multilingual Support: Capable of processing and generating content in 8+ languages.
- Long Context Window: Features an extended 128K token context window for handling extensive inputs.
- Unrestricted Instruction Following: Designed to follow instructions precisely without excessive refusals.
Use Cases & Performance
MIST-1-70B is optimized for production environments, offering detailed and accurate responses. It demonstrates strong performance in reasoning, coding, math, and general instruction following, with an average generation speed of 23 tokens/second. Hardware requirements include 140GB VRAM for bfloat16 precision or 40GB VRAM for 4-bit quantized usage.