dphn/dolphin-2.9.3-mistral-nemo-12b

Warm
Public
12B
FP8
32768
License: apache-2.0
Hugging Face
Overview

Overview

Dolphin 2.9.3 Mistral Nemo 12b is a 12 billion parameter language model fine-tuned by Eric Hartford and Cognitive Computations. It is built upon the mistralai/Mistral-Nemo-Base-2407 architecture and utilizes the ChatML prompt template. The model was trained with a sequence length of 8192 tokens, though its base model supports up to 128K context.

Key Capabilities

  • Instruction Following: Designed to be highly compliant with user instructions.
  • Conversational AI: Capable of engaging in natural dialogue.
  • Coding Skills: Includes abilities for code generation and translation.
  • Agentic Abilities: Possesses initial capabilities for autonomous task execution.
  • Function Calling: Supports integration with external tools and functions.
  • Uncensored: The model is intentionally uncensored, offering high compliance to all requests, including potentially unethical ones. Users are advised to implement their own alignment layers.

Training Details

The model was trained using Axolotl on a diverse dataset including sharegpt formatted data, with a focus on system chat, multilingual chat, code generation, code feedback, mathematical reasoning (Orca-Math), and agentic instruction sets (agent_instruct_react, toolbench). Training involved 3 epochs with a learning rate of 5e-6 and a total batch size of 128, leveraging 8 GPUs. Key layers such as lm_head, embed_tokens, and various mlp and self_attn components were unfrozen during training.

Good For

  • Developers requiring a highly compliant and uncensored model for various tasks.
  • Applications needing strong instruction following and conversational capabilities.
  • Use cases involving code generation, translation, and feedback.
  • Experimentation with agentic workflows and function calling.