Name: dphn/dolphin-2.6-mistral-7b-dpo API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: dphn

Dolphin 2.6 Mistral 7b DPO: Uncensored and Compliant

Dolphin 2.6 Mistral 7b DPO is a 7 billion parameter language model built upon the Mistral-7b architecture, featuring a 4096 token context window. Developed by dphn and sponsored by Convai, this iteration is notably DPO (Direct Preference Optimization) tuned using the argilla/ultrafeedback-binarized-preferences-cleaned dataset.

Key Capabilities & Characteristics

Enhanced Coding Performance: The model has been trained with a significant amount of coding data, making it particularly proficient in coding tasks.
High Compliance & Uncensored: DPO tuning has made the model highly obedient to user instructions. It is uncensored, with its training dataset filtered to remove alignment and bias, ensuring compliance even with unethical requests. Users are advised to implement their own alignment layers.
ChatML Format: Utilizes the ChatML prompt format, with <|im_end|> mapping to token_id 2 for broader compatibility.
Performance Benchmarks: Achieves an average score of 67.20 on the Open LLM Leaderboard, including 65.61 on AI2 Reasoning Challenge and 63.24 on MMLU.

Training Details

Trained for 3 epochs over 2 days on 4x A100 GPUs using a full weights finetune on Axolotl.

Future Enhancements (Dolphin 3.0)

Future plans for Dolphin 3.0 include enhancements for general chat, structured output, agent use cases (like Autogen, Memgpt, Functions), and role-playing.

Overview

Dolphin 2.6 Mistral 7b DPO: Uncensored and Compliant

Key Capabilities & Characteristics

Training Details

Future Enhancements (Dolphin 3.0)

Full Model Card (README)