dphn/Dolphin3.0-R1-Mistral-24B

Hugging Face
TEXT GENERATIONConcurrency Cost:2Model Size:24BQuant:FP8Ctx Length:32kPublished:Feb 6, 2025Architecture:Transformer0.2K Warm

Dolphin3.0-R1-Mistral-24B is a 24 billion parameter instruct-tuned model from the Dolphin 3.0 series, developed by Eric Hartford, Ben Gitter, BlouseJury, and Cognitive Computations. Built on the Mistral architecture with a 32768 token context length, it is designed as a general-purpose local model excelling in reasoning, coding, math, and agentic tasks. This R1 version is specifically trained on 800k reasoning traces to enhance its general-purpose reasoning capabilities, aiming to provide a steerable alternative to proprietary models.

Loading preview...

Dolphin 3.0 R1 Mistral 24B Overview

Dolphin 3.0 R1 Mistral 24B is an advanced instruct-tuned model, part of the Dolphin 3.0 Collection, developed by Eric Hartford, Ben Gitter, BlouseJury, and Cognitive Computations. This 24 billion parameter model is engineered to be a versatile, general-purpose local AI, supporting a wide array of applications including coding, mathematical problem-solving, agentic workflows, function calling, and general conversational use cases. It leverages a 32768 token context length, making it suitable for complex tasks requiring extensive context.

Key Capabilities & Differentiators

  • Enhanced Reasoning: The R1 version has undergone 3 epochs of training on 800k reasoning traces from the Dolphin-R1 dataset, significantly boosting its general-purpose reasoning abilities.
  • User Control & Steerability: Unlike proprietary models, Dolphin emphasizes user control over system prompts and alignment, allowing developers to define ethics and guidelines without external imposition. This ensures data privacy and application-specific customization.
  • General Purpose: Designed to function as a comprehensive reasoning instruct model, akin to the capabilities found in leading commercial models like ChatGPT, Claude, and Gemini, but with the advantage of local deployment and user-defined control.
  • Optimized for Low Temperature: Experimental observations suggest optimal performance with a low temperature setting (0.05 to 0.1), which helps prevent issues like second-guessing or self-correction.

Training & Data

The model's development benefited from various open-source datasets, including those from OpenCoder-LLM, Microsoft (orca-agentinstruct, orca-math-word-problems), NousResearch (hermes-function-calling), AI-MO (NuminaMath), allenai (tulu-3-sft-mixture), and HuggingFaceTB (smoltalk). The training process also utilized an excellent reward model from RLHFlow for dataset filtering and leveraged Deepseek-V3 for data augmentation.